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PREFACE 


Cookery  is  become  an  art, 
a noble  science; 
cooks  are  gentlemen. 
— TITUS  LIVIUS,  Ab  Urbe  Condita  XXXIX. vi 
(Robert  Burton,  Anatomy  of  Melancholy  1.2. 2. 2) 


This  BOOK  forms  a natural  sequel  to  the  material  on  information  structures  in 
Chapter  2 of  Volume  1,  because  it  adds  the  concept  of  linearly  ordered  data  to 
the  other  basic  structural  ideas. 

The  title  Sorting  and  Searching”  may  sound  as  if  this  book  is  only  for  those 
systems  programmers  who  are  concerned  with  the  preparation  of  general-purpose 
sorting  routines  or  applications  to  information  retrieval.  But  in  fact  the  area  of 
sorting  and  searching  provides  an  ideal  framework  for  discussing  a wide  variety 
of  important  general  issues: 

• How  are  good  algorithms  discovered? 

• How  can  given  algorithms  and  programs  be  improved? 

• How  can  the  efficiency  of  algorithms  be  analyzed  mathematically? 

• How  can  a person  choose  rationally  between  different  algorithms  for  the 
same  task? 

• In  what  senses  can  algorithms  be  proved  “best  possible”? 

• How  does  the  theory  of  computing  interact  with  practical  considerations? 

• How  can  external  memories  like  tapes,  drums,  or  disks  be  used  efficiently 
with  large  databases? 

Indeed,  I believe  that  virtually  every  important  aspect  of  programming  arises 
somewhere  in  the  context  of  sorting  or  searching! 

This  volume  comprises  Chapters  5 and  6 of  the  complete  series.  Chapter  5 
is  concerned  with  sorting  into  order;  this  is  a large  subject  that  has  been  divided 
chiefly  into  two  parts,  internal  sorting  and  external  sorting.  There  also  are 
supplementary  sections,  which  develop  auxiliary  theories  about  permutations 
(Section  5.1)  and  about  optimum  techniques  for  sorting  (Section  5.3).  Chapter  6 
deals  with  the  problem  of  searching  for  specified  items  in  tables  or  files;  this  is 
subdivided  into  methods  that  search  sequentially,  or  by  comparison  of  keys,  or 
by  digital  properties,  or  by  hashing,  and  then  the  more  difficult  problem  of 
secondary  key  retrieval  is  considered.  There  is  a surprising  amount  of  interplay 
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between  both  chapters,  with  strong  analogies  tying  the  topics  together.  Two 
important  varieties  of  information  structures  are  also  discussed,  in  addition  to 
those  considered  in  Chapter  2,  namely  priority  queues  (Section  5.2.3)  and  linear 
lists  represented  as  balanced  trees  (Section  6.2.3). 

Like  Volumes  1 and  2,  this  book  includes  a lot  of  material  that  does  not 
appear  in  other  publications.  Many  people  have  kindly  written  to  me  about 
their  ideas,  or  spoken  to  me  about  them,  and  I hope  that  I have  not  distorted 
the  material  too  badly  when  I have  presented  it  in  my  own  words. 

I have  not  had  time  to  search  the  patent  literature  systematically;  indeed, 
I decry  the  current  tendency  to  seek  patents  on  algorithms  (see  Section  5.4.5). 
If  somebody  sends  me  a copy  of  a relevant  patent  not  presently  cited  in  this 
book,  I will  dutifully  refer  to  it  in  future  editions.  However,  I want  to  encourage 
people  to  continue  the  centuries-old  mathematical  tradition  of  putting  newly 
discovered  algorithms  into  the  public  domain.  There  are  better  ways  to  earn  a 
living  than  to  prevent  other  people  from  making  use  of  one’s  contributions  to 
computer  science. 

Before  I retired  from  teaching,  I used  this  book  as  a text  for  a student’s 
second  course  in  data  structures,  at  the  junior-to-graduate  level,  omitting  most 
of  the  mathematical  material.  I also  used  the  mathematical  portions  of  this  book 
as  the  basis  for  graduate-level  courses  in  the  analysis  of  algorithms,  emphasizing 
especially  Sections  5.1,  5.2.2,  6.3,  and  6.4.  A graduate-level  course  on  concrete 
computational  complexity  could  also  be  based  on  Sections  5.3,  and  5.4.4,  together 
with  Sections  4.3.3,  4.6.3,  and  4.6.4  of  Volume  2. 

For  the  most  part  this  book  is  self-contained,  except  for  occasional  discus- 
sions relating  to  the  MIX  computer  explained  in  Volume  1.  Appendix  B contains  a 
summary  of  the  mathematical  notations  used,  some  of  which  are  a little  different 
from  those  found  in  traditional  mathematics  books. 


Preface  to  the  Second  Edition 

This  new  edition  matches  the  third  editions  of  Volumes  1 and  2,  in  which  I have 
been  able  to  celebrate  the  completion  of  T^X  and  METFIFONT  by  applying  those 
systems  to  the  publications  they  were  designed  for. 

The  conversion  to  electronic  format  has  given  me  the  opportunity  to  go 
over  every  word  of  the  text  and  every  punctuation  mark.  I’ve  tried  to  retain 
the  youthful  exuberance  of  my  original  sentences  while  perhaps  adding  some 
more  mature  judgment.  Dozens  of  new  exercises  have  been  added;  dozens  of 
old  exercises  have  been  given  new  and  improved  answers.  Changes  appear 
everywhere,  but  most  significantly  in  Sections  5.1.4  (about  permutations  and 
tableaux),  5.3  (about  optimum  sorting),  5.4.9  (about  disk  sorting),  6.2.2  (about 
entropy),  6.4  (about  universal  hashing),  and  6.5  (about  multidimensional  trees 
and  tries). 
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/^\  The  Art  of  Computer  Programming  is,  however,  still  a work  in  progress. 
JL  Research  on  sorting  and  searching  continues  to  grow  at  a phenomenal  rate. 
Therefore  some  parts  of  this  book  are  headed  by  an  “under  construction”  icon, 
to  apologize  for  the  fact  that  the  material  is  not  up-to-date.  For  example,  if  I 
were  teaching  an  undergraduate  class  on  data  structures  today,  I would  surely 
discuss  randomized  structures  such  as  treaps  at  some  length;  but  at  present,  I 
am  only  able  to  cite  the  principal  papers  on  the  subject,  and  to  announce  plans 
for  a future  Section  6.2.5  (see  page  478).  My  files  are  bursting  with  important 
material  that  I plan  to  include  in  the  final,  glorious,  third  edition  of  Volume  3, 
perhaps  17  years  from  now.  But  I must  finish  Volumes  4 and  5 first,  and  I do 
not  want  to  delay  their  publication  any  more  than  absolutely  necessary. 

I am  enormously  grateful  to  the  many  hundreds  of  people  who  have  helped 
me  to  gather  and  refine  this  material  during  the  past  35  years.  Most  of  the 
hard  work  of  preparing  the  new  edition  was  accomplished  by  Phyllis  Winkler 
(who  put  the  text  of  the  first  edition  into  form),  by  Silvio  Levy  (who 
edited  it  extensively  and  helped  to  prepare  several  dozen  illustrations),  and  by 
Jeffrey  Oldham  (who  converted  more  than  250  of  the  original  illustrations  to 
METflPOST  format).  The  production  staff  at  Addison-Wesley  has  also  been 
extremely  helpful,  as  usual. 

I have  corrected  every  error  that  alert  readers  detected  in  the  first  edition  — 
as  well  as  some  mistakes  that,  alas,  nobody  noticed  — and  I have  tried  to  avoid 
introducing  new  errors  in  the  new  material.  However,  I suppose  some  defects  still 
remain,  and  I want  to  fix  them  as  soon  as  possible.  Therefore  I will  cheerfully 
award  $2.56  to  the  first  finder  of  each  technical,  typographical,  or  historical  error. 
The  webpage  cited  on  page  iv  contains  a current  listing  of  all  corrections  that 
have  been  reported  to  me. 

Stanford,  California  D.  E.  K. 

February  1998 


There  are  certain  common  Privileges  of  a Writer, 
the  Benefit  whereof,  I hope,  there  will  be  no  Reason  to  doubt; 
Particularly,  that  where  I am  not  understood,  it  shall  be  concluded, 
that  something  very  useful  and  profound  is  coucht  underneath. 

— JONATHAN  SWIFT,  Tale  of  a Tub,  Preface  (1704) 


NOTES  ON  THE  EXERCISES 


The  EXERCISES  in  this  set  of  books  have  been  designed  for  self-study  as  well 
as  for  classroom  study.  It  is  difficult,  if  not  impossible,  for  anyone  to  learn  a 
subject  purely  by  reading  about  it,  without  applying  the  information  to  specific 
problems  and  thereby  being  encouraged  to  think  about  what  has  been  read. 
Furthermore,  we  all  learn  best  the  things  that  we  have  discovered  for  ourselves. 
Therefore  the  exercises  form  a major  part  of  this  work;  a definite  attempt  has 
been  made  to  keep  them  as  informative  as  possible  and  to  select  problems  that 
are  enjoyable  as  well  as  instructive. 

In  many  books,  easy  exercises  are  found  mixed  randomly  among  extremely 
difficult  ones.  A motley  mixture  is,  however,  often  unfortunate  because  readers 
like  to  know  in  advance  how  long  a problem  ought  to  take  — otherwise  they 
may  just  skip  over  all  the  problems.  A classic  example  of  such  a situation  is 
the  book  Dynamic  Programming  by  Richard  Bellman;  this  is  an  important, 
pioneering  work  in  which  a group  of  problems  is  collected  together  at  the  end 
of  some  chapters  under  the  heading  “Exercises  and  Research  Problems,”  with 
extremely  trivial  questions  appearing  in  the  midst  of  deep,  unsolved  problems. 
It  is  rumored  that  someone  once  asked  Dr.  Bellman  how  to  tell  the  exercises 
apart  from  the  research  problems,  and  he  replied,  “If  you  can  solve  it,  it  is  an 
exercise;  otherwise  it’s  a research  problem.” 

Good  arguments  can  be  made  for  including  both  research  problems  and 
very  easy  exercises  in  a book  of  this  kind;  therefore,  to  save  the  reader  from 
the  possible  dilemma  of  determining  which  are  which,  rating  numbers  have  been 
provided  to  indicate  the  level  of  difficulty.  These  numbers  have  the  following 
general  significance: 

Rating  Interpretation 

00  An  extremely  easy  exercise  that  can  be  answered  immediately  if  the 
material  of  the  text  has  been  understood;  such  an  exercise  can  almost 
always  be  worked  “in  your  head.” 

10  A simple  problem  that  makes  you  think  over  the  material  just  read,  but 
is  by  no  means  difficult.  You  should  be  able  to  do  this  in  one  minute  at 
most;  pencil  and  paper  may  be  useful  in  obtaining  the  solution. 

20  An  average  problem  that  tests  basic  understanding  of  the  text  mate- 
rial, but  you  may  need  about  fifteen  or  twenty  minutes  to  answer  it 
completely. 
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30  A problem  of  moderate  difficulty  and/or  complexity;  this  one  may 
involve  more  than  two  hours’  work  to  solve  satisfactorily,  or  even  more 
if  the  TV  is  on. 

40  Quite  a difficult  or  lengthy  problem  that  would  be  suitable  for  a term 
project  in  classroom  situations.  A student  should  be  able  to  solve  the 
problem  in  a reasonable  amount  of  time,  but  the  solution  is  not  trivial. 

50  A research  problem  that  has  not  yet  been  solved  satisfactorily,  as  far 
as  the  author  knew  at  the  time  of  writing,  although  many  people  have 
tried.  If  you  have  found  an  answer  to  such  a problem,  you  ought  to 
write  it  up  for  publication;  furthermore,  the  author  of  this  book  would 
appreciate  hearing  about  the  solution  as  soon  as  possible  (provided  that 
it  is  correct). 

By  interpolation  in  this  “logarithmic”  scale,  the  significance  of  other  rating 
numbers  becomes  clear.  For  example,  a rating  of  1 7 would  indicate  an  exercise 
that  is  a bit  simpler  than  average.  Problems  with  a rating  of  50  that  are 
subsequently  solved  by  some  reader  may  appear  with  a 45  rating  in  later  editions 
of  the  book,  and  in  the  errata  posted  on  the  Internet  (see  page  iv). 

The  remainder  of  the  rating  number  divided  by  5 indicates  the  amount  of 
detailed  work  required.  Thus,  an  exercise  rated  2\  may  take  longer  to  solve  than 
an  exercise  that  is  rated  25,  but  the  latter  will  require  more  creativity. 

The  author  has  tried  earnestly  to  assign  accurate  rating  numbers,  but  it  is 
difficult  for  the  person  who  makes  up  a problem  to  know  just  how  formidable  it 
will  be  for  someone  else  to  find  a solution;  and  everyone  has  more  aptitude  for 
certain  types  of  problems  than  for  others.  It  is  hoped  that  the  rating  numbers 
represent  a good  guess  at  the  level  of  difficulty,  but  they  should  be  taken  as 
general  guidelines,  not  as  absolute  indicators. 

This  book  has  been  written  for  readers  with  varying  degrees  of  mathematical 
training  and  sophistication;  as  a result,  some  of  the  exercises  are  intended  only  for 
the  use  of  more  mathematically  inclined  readers.  The  rating  is  preceded  by  an  M 
if  the  exercise  involves  mathematical  concepts  or  motivation  to  a greater  extent 
than  necessary  for  someone  who  is  primarily  interested  only  in  programming 
the  algorithms  themselves.  An  exercise  is  marked  with  the  letters  “HM”  if  its 
solution  necessarily  involves  a knowledge  of  calculus  or  other  higher  mathematics 
not  developed  in  this  book.  An  UHM"  designation  does  not  necessarily  imply 
difficulty. 

Some  exercises  are  preceded  by  an  arrowhead,  this  designates  prob- 
lems that  are  especially  instructive  and  especially  recommended.  Of  course,  no 
reader/student  is  expected  to  work  all  of  the  exercises,  so  those  that  seem  to 
be  the  most  valuable  have  been  singled  out.  (This  distinction  is  not  meant  to 
detract  from  the  other  exercises!)  Each  reader  should  at  least  make  an  attempt 
to  solve  all  of  the  problems  whose  rating  is  10  or  less;  and  the  arrows  may  help 
to  indicate  which  of  the  problems  with  a higher  rating  should  be  given  priority. 

Solutions  to  most  of  the  exercises  appear  in  the  answer  section.  Please  use 
them  wisely;  do  not  turn  to  the  answer  until  you  have  made  a genuine  effort  to 
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solve  the  problem  by  yourself,  or  unless  you  absolutely  do  not  have  time  to  work 
this  particular  problem.  After  getting  your  own  solution  or  giving  the  problem  a 
decent  try,  you  may  find  the  answer  instructive  and  helpful.  The  solution  given 
will  often  be  quite  short,  and  it  will  sketch  the  details  under  the  assumption 
that  you  have  earnestly  tried  to  solve  it  by  your  own  means  first.  Sometimes  the 
solution  gives  less  information  than  was  asked;  often  it  gives  more.  It  is  quite 
possible  that  you  may  have  a better  answer  than  the  one  published  here,  or  you 
may  have  found  an  error  in  the  published  solution;  in  such  a case,  the  author 
will  be  pleased  to  know  the  details.  Later  printings  of  this  book  will  give  the 
improved  solutions  together  with  the  solver’s  name  where  appropriate. 

When  working  an  exercise  you  may  generally  use  the  answers  to  previous 
exercises,  unless  specifically  forbidden  from  doing  so.  The  rating  numbers  have 
been  assigned  with  this  in  mind;  thus  it  is  possible  for  exercise  n + 1 to  have  a 
lower  rating  than  exercise  n,  even  though  it  includes  the  result  of  exercise  n as 
a special  case. 


Summary  of  codes: 

00 

Immediate 

10 

Simple  (one  minute) 

20 

Medium  (quarter  hour) 

► 

Recommended 

30 

Moderately  hard 

M 

Mathematically  oriented 

40 

Term  project 

HM 

Requiring  “higher  math” 

50 

Research  problem 

EXERCISES 

► 1.  [00]  What  does  the  rating  “ M20 ” mean? 

2.  [10]  Of  what  value  can  the  exercises  in  a textbook  be  to  the  reader? 

3.  [HM45]  Prove  that  when  n is  an  integer,  n > 2,  the  equation  xn  + yn  = zn  has 
no  solution  in  positive  integers  x,y,z. 


Two  hours'  daily  exercise  . . . will  be  enough 
to  keep  a hack  fit  for  his  work. 
— M.  H.  MAHON,  The  Handy  Horse  Book  (1865) 
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There  is  nothing  more  difficult  to  take  in  hand, 
more  perilous  to  conduct,  or  more  uncertain  in  its  success, 
than  to  take  the  lead  in  the  introduction  of 
a new  order  of  things. 

— NICCOLO  MACHIAVELLI,  The  Prince  (1513) 

" But  you  can 't  took  up  all  those  license 
numbers  in  time,"  Drake  objected. 

"We  don’t  have  to,  Paul.  We  merely  arrange  a list 
and  look  for  duplications." 

— PERRY  MASON,  in  The  Case  of  the  Angry  Mourner  (1951) 

"Treesort"  Computer — With  this  new  'computer-approach' 
to  nature  study  you  can  quickly  identify  over  260 
different  trees  of  U.S.,  Alaska,  and  Canada, 
even  palms,  desert  trees,  and  other  exotics. 

To  sort,  you  simply  insert  the  needle. 

— EDMUND  SCIENTIFIC  COMPANY,  Catalog  (1964) 

In  THIS  CHAPTER  we  shall  study  a topic  that  arises  frequently  in  programming: 
the  rearrangement  of  items  into  ascending  or  descending  order.  Imagine  how 
hard  it  would  be  to  use  a dictionary  if  its  words  were  not  alphabetized!  We 
will  see  that,  in  a similar  way,  the  order  in  which  items  are  stored  in  computer 
memory  often  has  a profound  influence  on  the  speed  and  simplicity  of  algorithms 
that  manipulate  those  items. 

Although  dictionaries  of  the  English  language  define  “sorting”  as  the  process 
of  separating  or  arranging  things  according  to  class  or  kind,  computer  program- 
mers traditionally  use  the  word  in  the  much  more  special  sense  of  marshaling 
things  into  ascending  or  descending  order.  The  process  should  perhaps  be  called 
ordering , not  sorting;  but  anyone  who  tries  to  call  it  “ordering”  is  soon  led 
into  confusion  because  of  the  many  different  meanings  attached  to  that  word. 
Consider  the  following  sentence,  for  example:  “Since  only  two  of  our  tape  drives 
were  in  working  order,  I was  ordered  to  order  more  tape  units  in  short  order, 
in  order  to  order  the  data  several  orders  of  magnitude  faster.”  Mathematical 
terminology  abounds  with  still  more  senses  of  order  (the  order  of  a group,  the 
order  of  a permutation,  the  order  of  a branch  point,  relations  of  order,  etc.,  etc.). 
Thus  we  find  that  the  word  “order”  can  lead  to  chaos. 

Some  people  have  suggested  that  “sequencing”  would  be  the  best  name  for 
the  process  of  sorting  into  order;  but  this  word  often  seems  to  lack  the  right 
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connotation,  especially  when  equal  elements  are  present,  and  it  occasionally 
conflicts  with  other  terminology.  It  is  quite  true  that  “sorting”  is  itself  an 
overused  word  (“I  was  sort  of  out  of  sorts  after  sorting  that  sort  of  data”), 
but  it  has  become  firmly  established  in  computing  parlance.  Therefore  we  shall 
use  the  word  “sorting”  chiefly  in  the  strict  sense  of  sorting  into  order,  without 
further  apologies. 

Some  of  the  most  important  applications  of  sorting  are: 

a)  Solving  the  “togetherness”  problem , in  which  all  items  with  the  same  identi- 
fication are  brought  together.  Suppose  that  we  have  10000  items  in  arbitrary 
order,  many  of  which  have  equal  values;  and  suppose  that  we  want  to  rearrange 
the  data  so  that  all  items  with  equal  values  appear  in  consecutive  positions.  This 
is  essentially  the  problem  of  sorting  in  the  older  sense  of  the  word;  and  it  can  be 
solved  easily  by  sorting  the  file  in  the  new  sense  of  the  word,  so  that  the  values 
are  in  ascending  order,  Vi  < v2  < ■ ■ ■ < tqoooo  • The  efficiency  achievable  in  this 
procedure  explains  why  the  original  meaning  of  “sorting”  has  changed. 

b)  Matching  items  in  two  or  more  files.  If  several  files  have  been  sorted  into  the 
same  order,  it  is  possible  to  find  all  of  the  matching  entries  in  one  sequential  pass 
through  them,  without  backing  up.  This  is  the  principle  that  Perry  Mason  used 
to  help  solve  a murder  case  (see  the  quotation  at  the  beginning  of  this  chapter). 
We  can  usually  process  a list  of  information  most  quickly  by  traversing  it  in 
sequence  from  beginning  to  end,  instead  of  skipping  around  at  random  in  the 
list,  unless  the  entire  list  is  small  enough  to  fit  in  a high-speed  random-access 
memory.  Sorting  makes  it  possible  to  use  sequential  accessing  on  large  files,  as 
a feasible  substitute  for  direct  addressing. 

c)  Searching  for  information  by  key  values.  Sorting  is  also  an  aid  to  searching, 
as  we  shall  see  in  Chapter  6,  hence  it  helps  us  make  computer  output  more 
suitable  for  human  consumption.  In  fact,  a listing  that  has  been  sorted  into 
alphabetic  order  often  looks  quite  authoritative  even  when  the  associated  nu- 
merical information  has  been  incorrectly  computed. 

Although  sorting  has  traditionally  been  used  mostly  for  business  data  pro- 
cessing, it  is  actually  a basic  tool  that  every  programmer  should  keep  in  mind 
for  use  in  a wide  variety  of  situations.  We  have  discussed  its  use  for  simplify- 
ing algebraic  formulas,  in  exercise  2.3.2-17.  The  exercises  below  illustrate  the 
diversity  of  typical  applications. 

One  of  the  first  large-scale  software  systems  to  demonstrate  the  versatility 
of  sorting  was  the  LARC  Scientific  Compiler  developed  by  J.  Erdwinn,  D.  E. 
Ferguson,  and  their  associates  at  Computer  Sciences  Corporation  in  1960.  This 
optimizing  compiler  for  an  extended  FORTRAN  language  made  heavy  use  of 
sorting  so  that  the  various  compilation  algorithms  were  presented  with  relevant 
parts  of  the  source  program  in  a convenient  sequence.  The  first  pass  was  a 
lexical  scan  that  divided  the  FORTRAN  source  code  into  individual  tokens,  each 
representing  an  identifier  or  a constant  or  an  operator,  etc.  Each  token  was 
assigned  several  sequence  numbers;  when  sorted  on  the  name  and  an  appropriate 
sequence  number,  all  the  uses  of  a given  identifier  were  brought  together.  The 
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“defining  entries”  by  which  a user  would  specify  whether  an  identifier  stood  for  a 
function  name,  a parameter,  or  a dimensioned  variable  were  given  low  sequence 
numbers,  so  that  they  would  appear  first  among  the  tokens  having  a given 
identifier;  this  made  it  easy  to  check  for  conflicting  usage  and  to  allocate  storage 
with  respect  to  EQUIVALENCE  declarations.  The  information  thus  gathered  about 
each  identifier  was  now  attached  to  each  token;  in  this  way  no  “symbol  table” 
of  identifiers  needed  to  be  maintained  in  the  high-speed  memory.  The  updated 
tokens  were  then  sorted  on  another  sequence  number,  which  essentially  brought 
the  source  program  back  into  its  original  order  except  that  the  numbering  scheme 
was  cleverly  designed  to  put  arithmetic  expressions  into  a more  convenient 
“Polish  prefix”  form.  Sorting  was  also  used  in  later  phases  of  compilation,  to 
facilitate  loop  optimization,  to  merge  error  messages  into  the  listing,  etc.  In 
short,  the  compiler  was  designed  so  that  virtually  all  the  processing  could  be 
done  sequentially  from  hies  that  were  stored  in  an  auxiliary  drum  memory,  since 
appropriate  sequence  numbers  were  attached  to  the  data  in  such  a way  that  it 
could  be  sorted  into  various  convenient  arrangements. 

Computer  manufacturers  of  the  1960s  estimated  that  more  than  25  percent 
of  the  running  time  on  their  computers  was  spent  on  sorting,  when  all  their 
customers  were  taken  into  account.  In  fact,  there  were  many  installations  in 
which  the  task  of  sorting  was  responsible  for  more  than  half  of  the  computing 
time.  From  these  statistics  we  may  conclude  that  either  (i)  there  are  many 
important  applications  of  sorting,  or  (ii)  many  people  sort  when  they  shouldn’t, 
or  (iii)  inefficient  sorting  algorithms  have  been  in  common  use.  The  real  truth 
probably  involves  all  three  of  these  possibilities,  but  in  any  event  we  can  see  that 
sorting  is  worthy  of  serious  study,  as  a practical  matter. 

Even  if  sorting  were  almost  useless,  there  would  be  plenty  of  rewarding  rea- 
sons for  studying  it  anyway!  The  ingenious  algorithms  that  have  been  discovered 
show  that  sorting  is  an  extremely  interesting  topic  to  explore  in  its  own  right. 
Many  fascinating  unsolved  problems  remain  in  this  area,  as  well  as  quite  a few 
solved  ones. 

From  a broader  perspective  we  will  find  also  that  sorting  algorithms  make  a 
valuable  case  study  of  how  to  attack  computer  programming  problems  in  general. 
Many  important  principles  of  data  structure  manipulation  will  be  illustrated  in 
this  chapter.  We  will  be  examining  the  evolution  of  various  sorting  techniques 
in  an  attempt  to  indicate  how  the  ideas  were  discovered  in  the  first  place.  By 
extrapolating  this  case  study  we  can  learn  a good  deal  about  strategies  that  help 
us  design  good  algorithms  for  other  computer  problems. 

Sorting  techniques  also  provide  excellent  illustrations  of  the  general  ideas 
involved  in  the  analysis  of  algorithms  — the  ideas  used  to  determine  performance 
characteristics  of  algorithms  so  that  an  intelligent  choice  can  be  made  between 
competing  methods.  Readers  who  are  mathematically  inclined  will  find  quite  a 
few  instructive  techniques  in  this  chapter  for  estimating  the  speed  of  computer 
algorithms  and  for  solving  complicated  recurrence  relations.  On  the  other  hand, 
the  material  has  been  arranged  so  that  readers  without  a mathematical  bent  can 
safely  skip  over  these  calculations. 
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Before  going  on,  we  ought  to  define  our  problem  a little  more  clearly,  and 
introduce  some  terminology.  We  are  given  N items 

Ri,R2,  . . . , Rn 

to  be  sorted;  we  shall  call  them  records , and  the  entire  collection  of  N records 
will  be  called  a,  file.  Each  record  Rj  has  a key,  Kj,  which  governs  the  sorting 
process.  Additional  data,  besides  the  key,  is  usually  also  present;  this  extra 
satellite  information”  has  no  effect  on  sorting  except  that  it  must  be  carried 
along  as  part  of  each  record. 

An  ordering  relation  “<”  is  specified  on  the  keys  so  that  the  following 
conditions  are  satisfied  for  any  key  values  a,  b,  c: 

1)  Exactly  one  of  the  possibilities  a < b,  a = b,  b < a is  true.  (This  is  called 

the  law  of  trichotomy.) 

ii)  If  a < b and  b < c,  then  a < c.  (This  is  the  familiar  law  of  transitivity.) 

Properties  (i)  and  (ii)  characterize  the  mathematical  concept  of  linear  ordering, 
also  called  total  ordering.  Any  relationship  “<”  satisfying  these  two  properties 
can  be  sorted  by  most  of  the  methods  to  be  mentioned  in  this  chapter,  although 
some  sorting  techniques  are  designed  to  work  only  with  numerical  or  alphabetic 
keys  that  have  the  usual  ordering. 

The  goal  of  sorting  is  to  determine  a permutation  p(l) p(2)  ...p(N)  of  the 
indices  {1,  2, . . . , A}  that  will  put  the  keys  into  nondecreasing  order: 

Kp(i)  < -Kp(2)  <•••  < KP(N )•  (i) 

The  sorting  is  called  stable  if  we  make  the  further  requirement  that  records  with 
equal  keys  should  retain  their  original  relative  order.  In  other  words,  stable 
sorting  has  the  additional  property  that 

P(l)  < PU ) whenever  Kp(l)  = Kp{])  and  * < j.  (2) 

In  some  cases  we  will  want  the  records  to  be  physically  rearranged  in  storage 
so  that  their  keys  are  in  order.  But  in  other  cases  it  will  be  sufficient  merely  to 
have  an  auxiliary  table  that  specifies  the  permutation  in  some  way,  so  that  the 
records  can  be  accessed  in  order  of  their  keys. 

A few  of  the  sorting  methods  in  this  chapter  assume  the  existence  of  either 
or  both  of  the  values  “oo”  and  oo”,  which  are  defined  to  be  greater  than  or 
less  than  all  keys,  respectively: 

-oo  < Kj  < oo,  for  1 < j < N.  (3) 

Such  extreme  values  are  occasionally  used  as  artificial  keys  or  as  sentinel  indica- 
tors. The  case  of  equality  is  excluded  in  (3);  if  equality  can  occur,  the  algorithms 
can  be  modified  so  that  they  will  still  work,  but  usually  at  the  expense  of  some 
elegance  and  efficiency. 

Sorting  can  be  classified  generally  into  internal  sorting,  in  which  the  records 
are  kept  entirely  in  the  computer’s  high-speed  random-access  memory,  and  ex- 
ternal sorting,  when  more  records  are  present  than  can  be  held  comfortably  in 
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memory  at  once.  Internal  sorting  allows  more  flexibility  in  the  structuring  and 
accessing  of  the  data,  while  external  sorting  shows  us  how  to  live  with  rather 
stringent  accessing  constraints. 

The  time  required  to  sort  N records,  using  a decent  general-purpose  sorting 
algorithm,  is  roughly  proportional  to  N log  IV;  we  make  about  log  A?'  “passes” 
over  the  data.  This  is  the  minimum  possible  time,  as  we  shall  see  in  Section  5.3.1, 
if  the  records  are  in  random  order  and  if  sorting  is  done  by  pairwise  comparisons 
of  keys.  Thus  if  we  double  the  number  of  records,  it  will  take  a little  more 
than  twice  as  long  to  sort  them,  all  other  things  being  equal.  (Actually,  as  N 
approaches  infinity,  a better  indication  of  the  time  needed  to  sort  is  N(\og  N)2, 
if  the  keys  are  distinct,  since  the  size  of  the  keys  must  grow  at  least  as  fast  as 
log  N;  but  for  practical  purposes,  N never  really  approaches  infinity.) 

On  the  other  hand,  if  the  keys  are  known  to  be  randomly  distributed  with 
respect  to  some  continuous  numerical  distribution,  we  will  see  that  sorting  can 
be  accomplished  in  O(N)  steps  on  the  average. 

EXERCISES  — First  Set 

1.  [M20]  Prove,  from  the  laws  of  trichotomy  and  transitivity,  that  the  permutation 
p(l)p(2) . . .p(N)  is  uniquely  determined  when  the  sorting  is  assumed  to  be  stable. 

2.  [21  ] Assume  that  each  record  Rj  in  a certain  file  contains  two  keys,  a “major  key” 
Kj  and  a “minor  key”  kj,  with  a linear  ordering  < defined  on  each  of  the  sets  of  keys. 
Then  we  can  define  lexicographic  order  between  pairs  of  keys  ( K , k)  in  the  usual  way: 

( Ki,ki ) < ( Kj,kj ) if  Ki  < Kj  or  if  Ki  = Kj  and  ki  < kj. 

Alice  took  this  file  and  sorted  it  first  on  the  major  keys,  obtaining  n groups  of 
records  with  equal  major  keys  in  each  group, 

Ap(i)  — Ap(q)  <-  -^p(*i+i)  — * — A”p(i2)  "^  * * * ^p(in—i+ 1)  — * * * — A^p(in), 

where  i„  = N.  Then  she  sorted  each  of  the  n groups  Rp(i  _1+i),  ■ ■ ■ ,Rp(i  ) on  their 
minor  keys. 

Bill  took  the  same  original  file  and  sorted  it  first  on  the  minor  keys;  then  he  took 
the  resulting  file,  and  sorted  it  on  the  major  keys. 

Chris  took  the  same  original  file  and  did  a single  sorting  operation  on  it,  using 
lexicographic  order  on  the  major  and  minor  keys  (Kj,  kj). 

Did  everyone  obtain  the  same  result? 

3.  [ M25 ] Let  < be  a relation  on  K\, . . . , Kn  that  satisfies  the  law  of  trichotomy  but 
not  the  transitive  law.  Prove  that  even  without  the  transitive  law  it  is  possible  to  sort 
the  records  in  a stable  manner,  meeting  conditions  (l)  and  (2);  in  fact,  there  are  at 
least  three  arrangements  that  satisfy  the  conditions! 

► 4.  [21]  Lexicographers  don’t  actually  use  strict  lexicographic  order  in  dictionaries, 
because  uppercase  and  lowercase  letters  must  be  interfiled.  Thus  they  want  an  ordering 
such  as  this: 

a < A < aa  < AA  < AAA  < Aachen  < aah  < • • • < zzz  < ZZZ. 

Explain  how  to  implement  dictionary  order. 
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► 5.  [M28]  Design  a binary  code  for  all  nonnegative  integers  so  that  if  n is  encoded  as 
the  string  p(n)  we  have  m < n if  and  only  if  p(rn)  is  lexicographically  less  than  p(n). 
Moreover,  p(m)  should  not  be  a prefix  of  p(n)  for  any  m # n.  If  possible,  the  length  of 
p(n)  should  be  at  most  lgn  + O(loglogn)  for  all  large  n.  (Such  a code  is  useful  if  we 
want  to  sort  texts  that  mix  words  and  numbers,  or  if  we  want  to  map  arbitrarily  large 
alphabets  into  binary  strings.) 

6.  [15]  Mr.  B.  C.  Dull  (a  MIX  programmer)  wanted  to  know  if  the  number  stored  in 
location  A is  greater  than,  less  than,  or  equal  to  the  number  stored  in  location  B.  So 
he  wrote  ‘ LDA  A;  SUB  B”  and  tested  whether  register  A was  positive,  negative,  or  zero. 
What  serious  mistake  did  he  make,  and  what  should  he  have  done  instead? 

7.  [17]  Write  a MIX  subroutine  for  multiprecision  comparison  of  keys,  having  the 
following  specifications: 

Calling  sequence:  JMP  COMPARE 

Entry  conditions:  rll  = n;  CONTENTS  (A  + k)  = ak  and  CONTENTS  (B  + k)  = bk,  for 
1 < A;  < n;  assume  that  n > 1. 

Exit  conditions:  Cl  = GREATER,  if  (a„ , . . . , ai ) > (bn , . . . , &i ) ; 

Cl  = EQUAL,  if  (a„, . . . , ai)  = (bn,  ...,b1)- 
Cl  = LESS,  if  (a„, . . . , a i)  < (bn, . . . ,b i); 
rX  and  rll  are  possibly  affected. 

Here  the  relation  (a„, . . . ,a i)  < (bn, . . . ,bi)  denotes  lexicographic  ordering  from  left  to 
right;  that  is,  there  is  an  index  j such  that  ak  = bk  for  n > k > j,  but  a3  < b3. 

► 8.  [30]  Locations  A and  B contain  two  numbers  a and  b,  respectively.  Show  that  it  is 
possible  to  write  a MIX  program  that  computes  and  stores  min(a,  b)  in  location  C,  without 
using  any  jump  operators.  (Caution:  Since  you  will  not  be  able  to  test  whether  or  not 
arithmetic  overflow  has  occurred,  it  is  wise  to  guarantee  that  overflow  is  impossible 
regardless  of  the  values  of  a and  b.) 

9.  [ M27 } After  N independent,  uniformly  distributed  random  variables  between  0 
and  1 have  been  sorted  into  nondecreasing  order,  what  is  the  probability  that  the  rth 
smallest  of  these  numbers  is  < x? 

EXERCISES  — Second  Set 

Each  of  the  following  exercises  states  a problem  that  a computer  programmer  might 
have  had  to  solve  in  the  old  days  when  computers  didn’t  have  much  random-access 
memory.  Suggest  a “good”  way  to  solve  the  problem,  assuming  that  only  a few  thousand 
words  of  internal  memory  are  available,  supplemented  by  about  half  a dozen  tape  units 
(enough  tape  units  for  sorting).  Algorithms  that  work  well  under  such  limitations  also 
prove  to  be  efficient  on  modern  machines. 

10.  [15]  You  are  given  a tape  containing  one  million  words  of  data.  How  do  you 
determine  how  many  distinct  words  are  present  on  the  tape? 

11.  [18]  You  are  the  U.  S.  Internal  Revenue  Service;  you  receive  millions  of  “informa- 
tion forms  from  organizations  telling  how  much  income  they  have  paid  to  people,  and 
millions  of  tax  forms  from  people  telling  how  much  income  they  have  been  paid.  How 
do  you  catch  people  who  don’t  report  all  of  their  income? 

12.  [M25]  ( Transposing  a matrix.)  You  are  given  a magnetic  tape  containing  one 
million  words,  representing  the  elements  of  a 1000  X 1000  matrix  stored  in  order  by  rows: 
“i.i  “i,2  • • • ai.iooo  02,1  • • ■ <12,1000  • • • <iiooo,iooo.  How  do  you  create  a tape  in  which  the 


5 


SORTING 


7 


elements  are  stored  by  columns  1 . i u.2,1 . . . a 1000,1 1 .2  . ■ . a 1 000.2  ■ . . uiooo,iooo  instead? 
(Try  to  make  less  than  a dozen  passes  over  the  data.) 

13.  [M26]  How  could  you  “shuffle”  a large  file  of  N words  into  a random  rearrange- 
ment? 

14.  [20]  You  are  working  with  two  computer  systems  that  have  different  conventions 
for  the  “collating  sequence”  that  defines  the  ordering  of  alphameric  characters.  How  do 
you  make  one  computer  sort  alphameric  files  in  the  order  used  by  the  other  computer? 

15.  [IS]  You  are  given  a list  of  the  names  of  a fairly  large  number  of  people  born  in 
the  U.S.A.,  together  with  the  name  of  the  state  where  they  were  born.  How  do  you 
count  the  number  of  people  born  in  each  state?  (Assume  that  nobody  appears  in  the 
list  more  than  once.) 

16.  [20]  In  order  to  make  it  easier  to  make  changes  to  large  FORTRAN  programs,  you 
want  to  design  a “cross-reference”  routine;  such  a routine  takes  FORTRAN  programs 
as  input  and  prints  them  together  with  an  index  that  shows  each  use  of  each  identifier 
(that  is,  each  name)  in  the  program.  How  should  such  a routine  be  designed? 

► 17.  [33]  ( Library  card  sorting.)  Before  the  days  of  computerized  databases,  every 
library  maintained  a catalog  of  cards  so  that  users  could  find  the  books  they  wanted. 
But  the  task  of  putting  catalog  cards  into  an  order  convenient  for  human  use  turned  out 
to  be  quite  complicated  as  library  collections  grew.  The  following  “alphabetical”  listing 
indicates  many  of  the  procedures  recommended  in  the  American  Library  Association 
Rules  for  Filing  Catalog  Cards  (Chicago:  1942): 


Text  of  card 

R.  Accademia  nazionale  dei  Lincei,  Rome 
1812;  ein  historischer  Roman. 
Bibliotheque  d’histoire  revolutionnaire. 
Bibliotheque  des  curiosites. 

Brown,  Mrs.  J.  Crosby 
Brown,  John 

Brown,  John,  mathematician 
Brown,  John,  of  Boston 
Brown,  John,  1715-1766 
BROWN,  JOHN,  1715-1766 
Brown,  John,  d.  1811 
Brown,  Dr.  John,  1810-1882 
Brown- Williams,  Reginald  Makepeace 
Brown  America. 

Brown  & Dallison’s  Nevada  directory. 
Brownjohn,  Alan 

Den’,  Vladimir  Eduardovich,  1867- 
The  den. 

Den  lieben  langen  Tag. 

Dix,  Morgan,  1827-1908 
1812  ouverture. 

Le  XIXe  siecle  frangais. 

The  1847  issue  of  U.  S.  stamps. 

1812  overture. 

I am  a mathematician. 


Remarks 

Ignore  foreign  royalty  (except  British) 
Achtzehnhundertzwolf 
Treat  apostrophe  as  space  in  French 
Ignore  accents  on  letters 
Ignore  designation  of  rank 
Names  with  dates  follow  those  without 
. . . and  the  latter  are  subarranged 
by  descriptive  words 
Arrange  identical  names  by  birthdate 
Works  “about”  follow  works  “by” 
Sometimes  birthdate  must  be  estimated 
Ignore  designation  of  rank 
Treat  hyphen  as  space 
Book  titles  follow  compound  names 
& in  English  becomes  “and” 

Ignore  apostrophe  in  names 

Ignore  an  initial  article 

. . . provided  it’s  in  nominative  case 

Names  precede  words 

Dix-huit  cent  douze 

Dix-neuvieme 

Eighteen  forty-seven 

Eighteen  twelve 

(a  book  by  Norbert  Wiener) 
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Text  of  card 

IBM  journal  of  research  and  development. 
ha-I  ha-ehad. 

Ia;  a love  story. 

International  Business  Machines  Corporation 
al-KhuwarizmT,  Muhammad  ibn  Musa, 
fl.  813-846 

Labour.  A magazine  for  all  workers. 

Labor  research  association 
Labour,  see  Labor 
McCall’s  cookbook 
McCarthy,  John,  1927- 
Machine-independent  computer 
programming. 

MacMahon,  Maj.  Percy  Alexander, 

1854-1929 
Mrs.  Dalloway. 

Mistress  of  mistresses. 

Royal  society  of  London 
St.  Petersburger  Zeitung. 

Saint-Saens,  Camille,  1835-1921 
Ste-Marie,  Gaston  P 
Seminumerical  algorithms. 

Uncle  Tom’s  cabin. 

U.  S.  bureau  of  the  census. 

Vandermonde,  Alexandre  Theophile, 
1735-1796 

Van  Valkenburg,  Mac  Elwyn,  1921- 
Von  Neumann,  John,  1903-1957 
The  whole  art  of  legerdemain. 

Who’s  afraid  of  Virginia  Woolf? 

Wijngaarden,  Adriaan  van,  1916- 


Remarks 

Initials  are  like  one-letter  words 
Ignore  initial  article 
Ignore  punctuation  in  titles 

Ignore  initial  “al-”  in  Arabic  names 
Respell  it  “Labor” 

Cross-reference  card 
Ignore  apostrophe  in  English 
Me  = Mac 

Treat  hyphen  as  space 

Ignore  designation  of  rank 
“Mrs.”  = “Mistress” 

Don’t  ignore  British  royalty 
“St.”  = “Saint”,  even  in  German 
Treat  hyphen  as  space 
Sainte 

(a  book  by  Donald  Ervin  Knuth) 

(a  book  by  Harriet  Beecher  Stowe) 
“U.  S.”  = “United  States” 


Ignore  space  after  prefix  in  surnames 

Ignore  initial  article 
Ignore  apostrophe  in  English 
Surname  begins  with  uppercase  letter 

exceptions,  and  there  are  many  other  rules 


(Most  of  these  rules  are  subject  to  certain 
not  illustrated  here.) 

If  you  were  given  the  job  of  sorting  large  quantities  of  catalog  cards  by  computer, 
and  eventually  maintaining  a very  large  file  of  such  cards,  and  if  you  had  no  chance  to 
change  these  long-standing  policies  of  card  filing,  how  would  you  arrange  the  data  in 
such  a way  that  the  sorting  and  merging  operations  are  facilitated? 

18.  [M25]  (E.  T.  Parker.)  Leonhard  Euler  once  conjectured  [Nova  Acta  Acad.  Sci. 
Petropolitanae  13  (1795),  45-63,  §3;  written  in  1778]  that  there  are  no  solutions  to  the 
equation 

u6  + v6  + w6  + x6  + y6  = z6 

in  positive  integers  u,  v,  w,  x,  y,  z.  At  the  same  time  he  conjectured  that 


x^  + ---  + x^1=x^ 

would  have  no  positive  integer  solutions,  for  all  n > 3,  but  this  more  general  conjecture 
was  disproved  by  the  computer-discovered  identity  275  + 84s  + 110s  + 1335  = 1445; 
see  L.  J.  Lander,  T.  R.  Parkin,  and  J.  L.  Selfridge,  Math.  Comp.  21  (1967),  446-459. 
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Infinitely  many  counterexamples  when  n = 4 were  subsequently  found  by  Noam  Elkies 
[Math.  Comp.  51  (1988),  825-835],  Can  you  think  of  a way  in  which  sorting  would 
help  in  the  search  for  counterexamples  to  Euler’s  conjecture  when  n = 6? 

► 19.  [24 ] Given  a file  containing  a million  or  so  distinct  30-bit  binary  words  xi , . . . , xN, 
what  is  a good  way  to  find  all  complementary  pairs  {xi,Xj}  that  are  present?  (Two 
words  are  complementary  when  one  has  0 wherever  the  other  has  1,  and  conversely; 
thus  they  are  complementary  if  and  only  if  their  sum  is  (11 . . . 1)2,  when  they  are 
treated  as  binary  numbers.) 

► 20.  [25]  Given  a file  containing  1000  30-bit  words  x\, . . . , Xiooo,  how  would  you  pre- 
pare a list  of  all  pairs  ( Xi,Xj ) such  that  xt  = Xj  except  in  at  most  two  bit  positions? 

21.  [22]  How  would  you  go  about  looking  for  five-letter  anagrams  such  as  CARET, 
CARTE,  CATER,  CRATE,  REACT,  RECTA,  TRACE;  CRUEL,  LUCRE,  ULCER;  DOWRY,  ROWDY,  WORDY? 
[One  might  wish  to  know  whether  there  are  any  sets  of  ten  or  more  five-letter  English 
anagrams  besides  the  remarkable  set 

APERS,  ASPER,  PARES,  PARSE,  PEARS,  PRASE,  PRESA,  RAPES,  REAPS,  SPAER,  SPARE,  SPEAR, 

to  which  we  might  add  the  French  word  APRES.] 

22.  [M28]  Given  the  specifications  of  a fairly  large  number  of  directed  graphs,  what 
approach  will  be  useful  for  grouping  the  isomorphic  ones  together?  (Directed  graphs  are 
isomorphic  if  there  is  a one-to-one  correspondence  between  their  vertices  and  a one-to- 
one  correspondence  between  their  arcs,  where  the  correspondences  preserve  incidence 
between  vertices  and  arcs.) 

23.  [30]  In  a certain  group  of  4096  people,  everyone  has  about  100  acquaintances. 
A file  has  been  prepared  listing  all  pairs  of  people  who  are  acquaintances.  (The  relation 
is  symmetric:  If  x is  acquainted  with  y,  then  y is  acquainted  with  x.  Therefore  the  file 
contains  roughly  200,000  entries.)  How  would  you  design  an  algorithm  to  list  all  the 
fc-person  cliques  in  this  group  of  people,  given  k?  (A  clique  is  an  instance  of  mutual 
acquaintances:  Everyone  in  the  clique  is  acquainted  with  everyone  else.)  Assume  that 
there  are  no  cliques  of  size  25,  so  the  total  number  of  cliques  cannot  be  enormous. 

► 24.  [30]  Three  million  men  with  distinct  names  were  laid  end-to-end,  reaching  from 
New  York  to  California.  Each  participant  was  given  a slip  of  paper  on  which  he  wrote 
down  his  own  name  and  the  name  of  the  person  immediately  west  of  him  in  the  line. 
The  man  at  the  extreme  western  end  didn’t  understand  what  to  do,  so  he  threw  his 
paper  away;  the  remaining  2,999,999  slips  of  paper  were  put  into  a huge  basket  and 
taken  to  the  National  Archives  in  Washington,  D.C.  Here  the  contents  of  the  basket 
were  shuffled  completely  and  transferred  to  magnetic  tapes. 

At  this  point  an  information  scientist  observed  that  there  was  enough  information 
on  the  tapes  to  reconstruct  the  list  of  people  in  their  original  order.  And  a computer 
scientist  discovered  a way  to  do  the  reconstruction  with  fewer  than  1000  passes  through 
the  data  tapes,  using  only  • sequential  accessing  of  tape  files  and  a small  amount  of 
random-access  memory.  How  was  that  possible? 

[In  other  words,  given  the  pairs  (xi,  Xj+i),  for  1 < i < N,  in  random  order, 
where  the  Xi  are  distinct,  how  can  the  sequence  X\X2  . . . xjv  be  obtained,  restricting 
all  operations  to  serial  techniques  suitable  for  use  with  magnetic  tapes?  This  is  the 
problem  of  sorting  into  order  when  there  is  no  easy  way  to  tell  which  of  two  given  keys 
precedes  the  other;  we  have  already  raised  this  question  as  part  of  exercise  2.2.3-25.] 
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25.  [M21]  ( Discrete  logarithms.)  You  know  that  p is  a (rather  large)  prime  number, 
and  that  a is  a primitive  root  modulo  p.  Therefore,  for  all  b in  the  range  1 < b < p, 
there  is  a unique  n such  that  an  modp  = 6,  1 < n < p.  (This  n is  called  the  index 
of  b modulo  p,  with  respect  to  a.)  Explain  how  to  find  n,  given  b,  without  needing 
O(n)  steps.  [Hint:  Let  m = Wp)  and  try  to  solve  amni  = ba~n2  (modulo  p)  for 
0 < ni,«2  < m.] 
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*5.1.  COMBINATORIAL  PROPERTIES  OF  PERMUTATIONS 

A PERMUTATION  of  a finite  set  is  an  arrangement  of  its  elements  into  a row. 
Permutations  are  of  special  importance  in  the  study  of  sorting  algorithms,  since 
they  represent  the  unsorted  input  data.  In  order  to  study  the  efficiency  of 
different  sorting  methods,  we  will  want  to  be  able  to  count  the  number  of 
permutations  that  cause  a certain  step  of  a sorting  procedure  to  be  executed 
a certain  number  of  times. 

We  have,  of  course,  met  permutations  frequently  in  previous  chapters.  For 
example,  in  Section  1.2.5  we  discussed  two  basic  theoretical  methods  of  con- 
structing the  n\  permutations  of  n objects;  in  Section  1.3.3  we  analyzed  some 
algorithms  dealing  with  the  cycle  structure  and  multiplicative  properties  of 
permutations;  in  Section  3.3.2  we  studied  their  “runs  up”  and  “runs  down.” 
The  purpose  of  the  present  section  is  to  study  several  other  properties  of  per- 
mutations, and  to  consider  the  general  case  where  equal  elements  are  allowed  to 
appear.  In  the  course  of  this  study  we  will  learn  a good  deal  about  combinatorial 
mathematics. 

The  properties  of  permutations  are  sufficiently  pleasing  to  be  interesting  in 
their  own  right,  and  it  is  convenient  to  develop  them  systematically  in  one  place 
instead  of  scattering  the  material  throughout  this  chapter.  But  readers  who 
are  not  mathematically  inclined  and  readers  who  are  anxious  to  dive  right  into 
sorting  techniques  are  advised  to  go  on  to  Section  5.2  immediately,  since  the 
present  section  actually  has  little  direct  connection  to  sorting. 


*5.1.1.  Inversions 


Let  a1a2...an  be  a permutation  of  the  set  {1,2,...,  n).  If  i < j and  a t > aj, 
the  pair  (ai;ci,)  is  called  an  inversion  of  the  permutation;  for  example,  the 
permutation  3 1 4 2 has  three  inversions:  (3, 1),  (3, 2),  and  (4, 2).  Each  inversion  is 
a pair  of  elements  that  is  out  of  sort,  so  the  only  permutation  with  no  inversions  is 
the  sorted  permutation  1 2 ...  n.  This  connection  with  sorting  is  the  chief  reason 
why  we  will  be  so  interested  in  inversions,  although  we  have  already  used  the 
concept  to  analyze  a dynamic  storage  allocation  algorithm  (see  exercise  2. 2. 2-9). 

The  concept  of  inversions  was  introduced  by  G.  Cramer  in  1750  [Intr.  a 
V Analyse  des  Lignes  Courbes  Algebriques  (Geneva:  1750),  657-659;  see  Thomas 
Muir,  Theory  of  Determinants  1 (1906),  11-14],  in  connection  with  his  famous 
rule  for  solving  linear  equations.  In  essence,  Cramer  defined  the  determinant  of 
an  n x n matrix  in  the  following  way: 


E(-1)inv(.1«2-«")xlaia:2a2...x, 


summed  over  all  permutations  cq  a2  ■ . . an  of  {1, 2, . . . , n},  where  inv(aj  a2  . . . an) 
is  the  number  of  inversions  of  the  permutation. 

The  inversion  table  b\b2  . . ,bn  of  the  permutation  cq  a2  . . . an  is  obtained  by 
letting  bj  be  the  number  of  elements  to  the  left  of  j that  are  greater  than  j. 
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In  other  words,  bj  is  the  number  of  inversions  whose  second  component  is  j. 
It  follows,  for  example,  that  the  permutation 

5 9 1 8 2 6 4 7 3 (r) 

has  the  inversion  table 

2 3 6 4 0 2 2 1 0,  (2) 

since  5 and  9 are  to  the  left  of  1;  5,  9,  8 are  to  the  left  of  2;  etc.  This  permutation 
has  20  inversions  in  all.  By  definition  the  numbers  bj  will  always  satisfy 

0 < bi  < n - 1,  0 < 62  < n - 2,  ...,  0 < 6n_!  <1,  bn  = 0.  (3) 

Perhaps  the  most  important  fact  about  inversions  is  the  simple  observation 
that  an  inversion  table  uniquely  determines  the  corresponding  permutation.  We 
can  go  back  from  any  inversion  table  b1b2.  ..bn  satisfying  (3)  to  the  unique 
permutation  that  produces  it,  by  successively  determining  the  relative  placement 
of  the  elements  n,n— 1, . . . , 1 (in  this  order).  For  example,  we  can  construct  the 
permutation  corresponding  to  (2)  as  follows:  Write  down  the  number  9;  then 
place  8 after  9,  since  bg  = 1.  Similarly,  put  7 after  both  8 and  9,  since  67  = 2. 
Then  6 must  follow  two  of  the  numbers  already  written  down,  because  be  = 2; 
the  partial  result  so  far  is  therefore 


9 8 6 7. 

Continue  by  placing  5 at  the  left,  since  b5  = 0;  put  4 after  four  of  the  numbers; 
and  put  3 after  six  numbers  (namely  at  the  extreme  right),  giving 

5 9 8 6 4 7 3. 

The  insertion  of  2 and  1 in  an  analogous  way  yields  (1). 

This  correspondence  is  important  because  we  can  often  translate  a problem 
stated  in  terms  of  permutations  into  an  equivalent  problem  stated  in  terms  of 
inversion  tables,  and  the  latter  problem  may  be  easier  to  solve.  For  example, 
consider  the  simplest  question  of  all:  How  many  permutations  of  {1, 2, . . . , n}  are 
possible?  The  answer  must  be  the  number  of  possible  inversion  tables,  and  they 
are  easily  enumerated  since  there  are  n choices  for  61 , independently  n - 1 choices 
for  62,  . . . , 1 choice  for  bn,  making  n(n  — 1) . . . 1 = n!  choices  in  all.  Inversions  are 
easy  to  count,  because  the  b’s  are  completely  independent  of  each  other,  while 
the  a’s  must  be  mutually  distinct. 

In  Section  1.2.10  we  analyzed  the  number  of  local  maxima  that  occur  when 
a permutation  is  read  from  right  to  left;  in  other  words,  we  counted  how  many 
elements  are  larger  than  any  of  their  successors.  (The  right-to-left  maxima  in  (1), 
for  example,  are  3,  7,  8,  and  9.)  This  is  the  number  of  j such  that  bj  has  its 
maximum  value,  n - j.  Since  bi  will  equal  n - 1 with  probability  1/n,  and 
(independently)  b2  will  be  equal  to  n - 2 with  probability  l/(n  - 1),  etc.,  it  is 
clear  by  consideration  of  the  inversions  that  the  average  number  of  right-to-left 
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Fig.  1.  The  truncated  octahedron,  which  shows  the  change  in  inversions  when  adjacent 
elements  of  a permutation  are  interchanged. 


maxima  is 


1 1 

* T 

n n — 1 


The  corresponding  generating  function  is  also  easily  derived  in  a similar  way. 

If  we  interchange  two  adjacent  elements  of  a permutation,  it  is  easy  to  see 
that  the  total  number  of  inversions  will  increase  or  decrease  by  unity.  Figure  1 
shows  the  24  permutations  of  {1,2,  3,4},  with  lines  joining  permutations  that 
differ  by  an  interchange  of  adjacent  elements;  following  any  line  downward  inverts 
exactly  one  new  pair.  Hence  the  number  of  inversions  of  a permutation  7r  is  the 
length  of  a downward  path  from  1234  to  tc  in  Fig.  1;  all  such  paths  must  have 
the  same  length. 

Incidentally,  the  diagram  in  Fig.  1 may  be  viewed  as  a three-dimensional 
solid,  the  “truncated  octahedron,”  which  has  8 hexagonal  faces  and  6 square 
faces.  This  is  one  of  the  classical  uniform  polyhedra  attributed  to  Archimedes 
(see  exercise  10). 

The  reader  should  not  confuse  inversions  of  a permutation  with  the  inverse 
of  a permutation.  Recall  that  we  can  write  a permutation  in  two-line  form 


f 1 2 3 •••  n \ 

\ai  a2  a3  ...  anJ  ’ (-4' 

the  inverse  a[  a'-2  a'3  . . . a'n  of  this  permutation  is  the  permutation  obtained  by 
interchanging  the  two  rows  and  then  sorting  the  columns  into  increasing  order 
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of  the  new  top  row: 

fax  a2  a3  ...  an'\  _ / 1 2 3 ... 

V 1 2 3 • • • n ) ~ \a'x  a'2  a3  ... 

For  example,  the  inverse  of  591826473  is  35971684  2, 

/591826473\ _ /l23456789\ 
\123456789y  “ ^3  5 9 7 1 6 8 4 2^ 

Another  way  to  define  the  inverse  is  to  say  that  o'  = k if  and  only  if  ak=  j. 

The  inverse  of  a permutation  was  first  defined  by  H.  A.  Rothe  [in  Samm- 
lung  combinatorisch-analytischer  Abhandlungen,  edited  by  C.  F.  Hindenburg,  2 
(Leipzig:  1800),  263-305],  who  noticed  an  interesting  connection  between  inverses 
and  inversions:  The  inverse  of  a permutation  has  exactly  as  many  inversions  as 
the  permutation  itself.  Rothe’s  proof  of  this  fact  was  not  the  simplest  possible 
one,  but  it  is  instructive  and  quite  pretty  nevertheless.  We  construct  an  n x n 
chessboard  having  a dot  in  column  j of  row  i whenever  a,  = j.  Then  we  put 
x s in  all  squares  that  have  dots  lying  both  below  (in  the  same  column)  and  to 
their  right  (in  the  same  row).  For  example,  the  diagram  for  5 9 1 8 2 6 4 7 3 is 


n 

°n 

since 


(5) 
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The  number  of  x ’s  is  the  number  of  inversions,  since  it  is  easy  to  see  that  bj  is  the 
number  of  x’s  in  column  j.  Now  if  we  transpose  the  diagram  — interchanging 
rows  and  columns  we  get  the  diagram  corresponding  to  the  inverse  of  the 
original  permutation.  Hence  the  number  of  x ’s  (the  number  of  inversions)  is 
the  same  in  both  cases.  Rothe  used  this  fact  to  prove  that  the  determinant  of  a 
matrix  is  unchanged  when  the  matrix  is  transposed. 

The  analysis  of  several  sorting  algorithms  involves  the  knowledge  of  how 
many  permutations  of  n elements  have  exactly  k inversions.  Let  us  denote  that 
number  by  In  (k)'.  Table  1 lists  the  first  few  values  of  this  function. 

By  considering  the  inversion  table  bx  b2  . . . bn,  it  is  obvious  that  /„( 0)  = 1, 
/„(!)  = n — 1,  and  there  is  a symmetry  property 


In({n2)~k)=In(k). 


(6) 
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Table  1 

PERMUTATIONS  WITH  k INVERSIONS 


n 

In(  0) 

Ml) 

In{  2) 

In  (3) 

In(  4) 

/»(5) 

In(  6) 

In(  7) 

/n(8) 

In(  9) 

In  (10) 

/n(ll) 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

2 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

3 

1 

2 

2 

] 1 

0 

0 

0 

0 

0 

0 

0 

0 

4 

1 

3 

5 

6 

5 

3 

1 

0 

0 

0 

0 

0 

5 

1 

4 

9 

15 

20 

22 

20 

15 

9 

4 

1 

0 

6 

1 

5 

14 

29 

49 

71 

90 

101 

101 

90 

71 

49 

Furthermore,  since  each  of  the  b’s  can  be  chosen  independently  of  the  others,  it 
is  not  difficult  to  see  that  the  generating  function 

Gn(z)  = In  (0)  + In(X)z  + In(2)z2  + ■ ■ ■ (7) 

satisfies  Gn(z)  = (1  + z + • • • + zn~1)  Gn_i(z);  hence  it  has  the  comparatively 
simple  form  noticed  by  O.  Rodrigues  [J.  de  Math.  4 (1839),  236-240]: 

(1  + 2 + • • • + 2"-1)  . . . (1  + 2)(1)  = (1  - 2")  ...  (1  - 22)(1  - Z)/(1  - 2)”.  (8) 

From  this  generating  function,  we  can  easily  extend  Table  1,  and  we  can  verify 
that  the  numbers  below  the  zigzag  line  in  that  table  satisfy 

In(k)  — In(k  - 1)  + J„_i(fc),  for  k < n.  (9) 

(This  relation  does  not  hold  above  the  zigzag  line.)  A more  complicated  argu- 
ment (see  exercise  14)  shows  that,  in  fact,  we  have  the  formulas 


M2)-  (2)  !. 


n>  2; 

)• 

n > 3; 

V)- 

n > 4; 

3 )+1’ 

n > 5; 

in  general,  the  formula  for  In(k)  contains  about  l.Qy/k  terms: 


In(k)  = 


f n+k  — 2 

V k 


!n+k- 3\  fn+k- 6\  /n+fc— 8\ 

V k- 2 ) + V k- 5 ) + V k-7  J 


+ (_!)!(  IN  + / n+k-Uj-j- 1 

k Uj  J ( k Uj  j 


n>  k,  (10) 


where  uj  — (3 j2  — j)/ 2 is  a so-called  “pentagonal  number.” 

If  we  divide  Gn(z)  by  n!  we  get  the  generating  function  gn(z)  for  the 
probability  distribution  of  the  number  of  inversions  in  a random  permutation 
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of  n elements.  This  is  the  product 


ffn(z)  = hi(z)h2(z) . ..hn(z), 


(11) 


where  hk(z)  = (1  + z + 1 - zk  1)/k  is  the  generating  function  for  the  uniform 

distribution  of  a random  nonnegative  integer  less  than  k.  It  follows  that 

mean(g„)  = mean(hi)  + mean (h2)  H 1-  mean(/i„) 

1 , | n — 1 _ n(n  — 1) 

2 '"+  2 “ 4 

var(gn)  = var(hi)  + var(h2)  H h var (hn) 

1 

4 + ’ ' + 12  72 


= 0 


= 0 


+ 


— 1 _ n(2n  + 5)(n  — 1) 


(12) 


(!3) 


So  the  average  number  of  inversions  is  rather  large,  about  |n2;  the  standard 
deviation  is  also  rather  large,  about  | n3/2. 

A remarkable  discovery  about  the  distribution  of  inversions  was  made  by 
P.  A.  MacMahon  [Amer.  J.  Math.  35  (1913),  281-322],  Let  us  define  the  index 
of  the  permutation  ai  a2  . . . an  as  the  sum  of  all  subscripts  j such  that  a.j  > a3+\ , 
1 < j < n.  For  example,  the  index  of  591826473  is  2 + 4 + 6 + 8 = 20.  By 
coincidence  the  index  is  the  same  as  the  number  of  inversions  in  this  case.  If  we 
list  the  24  permutations  of  {1,2, 3, 4},  namely 


Permutation 

Index 

Inversions 

Permutation 

Index 

Inversions 

12  3 4 

0 

0 

3 1 1 2 4 

1 

2 

1 2 4|3 

3 

1 

3|1  4|2 

4 

3 

1 3|2  4 

2 

1 

3|2|1  4 

3 

3 

1 3 4|2 

3 

2 

3 j 2 4|1 

4 

4 

1 4|2  3 

2 

2 

3 4|1  2 

2 

4 

1 4|3|2 

5 

3 

3 4|2|1 

5 

5 

2 1 1 3 4 

1 

1 

4|1  2 3 

1 

3 

2 1 4|3 

4 

2 

4|1  3|2 

4 

4 

2 3 1 1 4 

2 

2 

4|2|1  3 

3 

4 

2 3 4|1 

3 

3 

4|2  3|1 

4 

5 

2 4|1  3 

2 

3 

4|3|1  2 

3 

5 

2 4|3|1 

5 

4 

4|3|2|1 

6 

6 

we  see  that  the  number  of  permutations  having  a given  index,  k,  is  the  same  as 
the  number  having  k inversions. 

At  first  this  fact  might  appear  to  be  almost  obvious,  but  further  scrutiny 
makes  it  very  mysterious.  MacMahon  gave  an  ingenious  indirect  proof,  as  follows: 
Let  ind(ai  a2  . . . a„)  be  the  index  of  the  permutation  a\  a2  . . . an,  and  let 


Hn{z)  = J2zindiaia2'an)  (14) 

be  the  corresponding  generating  function;  the  sum  in  (14)  is  over  all  permutations 
of  {1, 2, . . . , n}.  We  wish  to  show  that  Hn(z)  — Gn(z).  For  this  purpose  we  will 
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define  a one-to-one  correspondence  between  arbitrary  n-tuples  (qi,  q2, . . . , qn)  of 
nonnegative  integers,  on  the  one  hand,  and  ordered  pairs  of  n-tuples 

((al,  a2)  • • ■ , an),  (Pl,P2,---,Pn)) 

on  the  other  hand,  where  a\  a2  . . . an  is  a permutation  of  the  indices  {1, 2, . . . , n} 
and  Pi  > Vi  > • • • > Pn  > 0.  This  correspondence  will  satisfy  the  condition 

9i  + 92  H Vqn-  ind(ai  a2  . . ■ an)  + (pi  + p2  H h p„).  (15) 

The  generating  function  z9l+92+  ’ +9n)  summed  over  all  n-tuples  of  nonnega- 
tive integers  (qi,q2,  ■ ■ ■ , qn),  is  Qn(z)  = 1/(1  — z)n;  and  the  generating  function 
£zpi+P2+-+p»  summed  over  all  n-tuples  of  integers  (pi,p2,  • • • ,Pn)  such  that 
Pi  > P2  > ' ' ' > Pn  > 0,  is 

Pn(z)  = 1/(1  - Z)(  1 - Z2) ...  (1  - zn),  (16) 

as  shown  in  exercise  15.  In  view  of  (15),  the  one-to-one  correspondence  we  are 
about  to  establish  will  prove  that  Qn(z ) = Hn(z)Pn(z),  that  is, 

Hn(z)  = Qn{z)/Pn{z).  (17) 

But  Qn{z)/Pn(z ) is  Gn(z),  by  (8). 

The  desired  correspondence  is  defined  by  a simple  sorting  procedure:  Any 
n-tuple  (qi,  q2, . . . ,qn)  can  be  rearranged  into  nonincreasing  order  qai  > qa,2  > 
^ Qan  in  a stable  manner,  where  a,i  a2  . . . an  is  a permutation  such  that  qa . = 
qaj+1  implies  a,j  < aj+ We  set  (pi,P2, . . . ,p„)  = (qai , qa2,  ■ ■ ■ , qan)  and  then,  for 
1 <j<n,  subtract  1 from  each  of  pi , . . . , p3  for  each  j such  that  a3  > aj+1.  We 
still  have  Pi  > p2  > • • • > p„,  because  pj  was  strictly  greater  than  pJ+i  whenever 
ai  > ai+ 1-  The  resulting  pair  ((ai,  a2, . . . , a„),  (pi,p2, . . . ,p„))  satisfies  (15), 
because  the  total  reduction  of  the  p’s  is  ind(ai  a2  . . . a„).  For  example,  if  n = 9 
and  (q!,...,q9)  = (3, 1, 4, 1,  5, 9, 2, 6,  5),  we  find  ai...o9  = 685931724  and 
(pi,...,p9)  = (5, 2, 2,  2,  2, 2, 1,1,1). 

Conversely,  we  can  easily  go  back  to  (qlt  q2, . . . , qn)  when  01  a2  . . . an  and 
(PiiP2)  • • • 1 Pn)  are  given.  (See  exercise  17.)  So  the  desired  correspondence  has 
been  established,  and  MacMahon’s  index  theorem  has  been  proved. 

D.  Foata  and  M.  P.  Schiitzenberger  discovered  a surprising  extension  of 
MacMahon’s  theorem,  about  65  years  after  MacMahon’s  original  publication: 
The  number  of  permutations  of  n elements  that  have  k inversions  and  index  l is 
the  same  as  the  number  that  have  l inversions  and  index  k.  In  fact,  Foata  and 
Schiitzenberger  found  a simple  one-to-one  correspondence  between  permutations 
of  the  first  kind  and  permutations  of  the  second  (see  exercise  25). 

EXERCISES 

1.  [10]  What  is  the  inversion  table  for  the  permutation  27184593  6?  What  per- 
mutation has  the  inversion  table  50121200? 

2.  [M20]  In  the  classical  problem  of  Josephus  (exercise  1. 3.2—22),  n men  are  initially 
arranged  in  a circle;  the  mth  man  is  executed,  the  circle  closes,  and  every  mth  man  is 
repeatedly  eliminated  until  all  are  dead.  The  resulting  execution  order  is  a permutation 
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°f  {1,2 For  example,  when  n = 8 and  m = 4 the  order  is  54613872  (man  1 
is  5th  out,  etc.);  the  inversion  table  corresponding  to  this  permutation  is  36310010 
Give  a simple  recurrence  relation  for  the  elements  b1b2...bn  of  the  inversion  table 
m the  general  Josephus  problem  for  n men,  when  every  mth  man  is  executed. 

3.  [18]  If  the  permutation  a1a2...an  corresponds  to  the  inversion  table  bi  b2  . . . bn 
what  is  the  permutation  oi  a2...o„  that  corresponds  to  the  inversion  table 


(n  - 1 - &i)(r 


2 — bo 


■ (0  — bn)  ? 


► 4.  [20]  Design  an  algorithm  suitable  for  computer  implementation  that  constructs 
the  permutation  a2...an  corresponding  to  a given  inversion  table  bx  b2  . . . bn  satis- 
fying (3)-  [Hint:  Consider  a linked-memory  technique.] 

5.  [35]  The  algorithm  of  exercise  4 requires  an  execution  time  roughly  proportional 
to  n + + • • • + 6n  on  typical  computers,  and  this  is  ©(n2)  on  the  average.  Is  there  an 

algorithm  whose  worst-case  running  time  is  substantially  better  than  order  ra2? 

► 6.  [26]  Design  an  algorithm  that  computes  the  inversion  table  bib2  . . . bn  correspond- 
ing to  a given  permutation  a,a2...an  of  {l,2,...,n},  where  the  running  time  is 
essentially  proportional  to  n log  n on  typical  computers. 

7.  [20]  Several  other  kinds  of  inversion  tables  can  be  defined,  corresponding  to  a 
given  permutation  «i  a2  . ,.an  of  {1, 2, ... , n},  besides  the  particular  table  b2...bn 

ehned  in  the  text;  in  this  exercise  we  will  consider  three  other  types  of  inversion  tables 
that  arise  in  applications. 

Let  Cj  be  the  number  of  inversions  whose  first  component  is  j,  that  is,  the  number 
of  elements  to  the  rtght  of  j that  are  less  than  j.  [Corresponding  to  (i)  we  have  the 
table  0 0 014  215  7;  clearly  0 < e,  < j.]  Let  Bj  = baj  and  C,  = c0  . 

Show  that  0 < Bj  < j and  0 < Cj  < n - j,  for  1 < j < n;  furthermore  show 
hat  the  permutation  aia2...an  can  be  determined  uniquely  when  either  cic2...c 
or  Bi  B2  . . . Bn  or  C\  C2  . . . Cn  is  given. 

8.  [M2 Jt]  Continuing  the  notation  of  exercise  7,  let  a\  a'2  . . . a'n  be  the  inverse  of 
he  permutation  ax  a2  .. . an,  and  let  the  corresponding  inversion  tables  be  b\  b'2  . . . b'n. 

Ci  c2  • • c„,  B1B2...Bn,  and  C,  C'2  . . . C'n . Find  as  many  interesting  relations  as  you 
can  between  the  numbers  a,-,  bh  C],  Bj,  Cj,  a'j,  b),  c',  B),  C'j. 

► 9.  [MSI]  Prove  that,  in  the  notation  of  exercise  7,  the  permutation  a1a2...a„  is  an 
involution  (that  is,  its  own  inverse)  if  and  only  if  bj  = Cj  for  1 < j < n. 

10.  [HM20]  Consider  Fig.  1 as  a polyhedron  in  three  dimensions.  What  is  the  diam- 

eter of  the  truncated  octahedron  (the  distance  between  vertex  1234  and  vertex  4321) 
if  all  of  its  edges  have  unit  length?  ' 

11.  [M25]  If  7 t = ai  a2  ...  an  is  a permutation  of  {1,2,...,  n},  let 

■®'(7r)  — {(aiiaj)  | * < J,  > dj} 

be  the  set  of  its  inversions,  and  let 


-®(7r)  {(aiiaj)  | i~>  j,  Hi  ~>  dj} 

be  the  non-inversions. 

a)  Prove  that  E(n)  and  E(n)  are  transitive.  (A  set  S of  ordered  pairs  is  called 
transitive  if  (o,c)  is  in  S whenever  both  (a,  b)  and  (b,  c)  are  in  S.) 
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b)  Conversely,  let  E be  any  transitive  subset  of  T = {(a;,?/)  | 1 < y < x < n}  whose 
complement  E = T\E  is  also  transitive.  Prove  that  there  exists  a permutation  n 
such  that  E(n)  = E. 

12.  [M28]  Continuing  the  notation  of  the  previous  exercise,  prove  that  if  7Ti  and  7T2 
are  permutations  and  if  E is  the  smallest  transitive  set  containing  E(ni)  U E(iV2),  then 
E is  transitive.  [Hence,  if  we  say  m is  “above”  7t2  whenever  E( 7Ti)  C E( 7r2),  a lattice 
of  permutations  is  defined;  there  is  a unique  “lowest”  permutation  “above”  two  given 
permutations.  Figure  1 is  the  lattice  diagram  when  n = 4.] 

13.  [M23]  It  is  well  known  that  half  of  the  terms  in  the  expansion  of  a determinant 
have  a plus  sign,  and  half  have  a minus  sign.  In  other  words,  there  are  just  as  many 
permutations  with  an  even  number  of  inversions  as  with  an  odd  number,  when  n > 2. 
Show  that,  in  general,  the  number  of  permutations  having  a number  of  inversions 
congruent  to  t modulo  m is  n!/m,  regardless  of  the  integer  t.  whenever  n > m. 

14.  [M24]  (F.  Franklin.)  A partition  of  n into  k distinct  parts  is  a representation 
n = Pi  + P2  + • ■ ■ + Pk,  where  pi  > p2  > • • • > Pk  > 0.  For  example,  the  partitions  of  7 
into  distinct  parts  are  7,  6 + 1,  5 + 2,  4 + 3,  4 + 2 + 1.  Let  fk{n)  be  the  number  of 
partitions  of  n into  k distinct  parts;  prove  that  Y.k(~l)k fk(n)  = 0,  unless  n has  the 
form  (3 j2  ±j)/ 2,  for  some  nonnegative  integer  j;  in  the  latter  case  the  sum  is  (-1)+ 
For  example,  when  n = 7 the  sum  is  -1  + 3-1  = 1,  and  7 = (3  • 22  + 2)/2.  [Hint: 
Represent  a partition  as  an  array  of  dots,  putting  pt  dots  in  the  ith  row,  for  1 < i < k. 
Find  the  smallest  j such  that  p3+i  < pj  — 1,  and  encircle  the  rightmost  dots  in  the  first 
j rows.  If  j < pk,  these  j dots  can  usually  be  removed,  tilted  45°,  and  placed  as  a new 
(fc+l)st  row.  On  the  other  hand  if  j > pk,  the  fcth  row  of  dots  can  usually  be  removed, 
tilted  45  , and  placed  to  the  right  of  the  circled  dots.  (See  Fig.  2.)  This  process  pairs 
off  partitions  having  an  odd  number  of  rows  with  partitions  having  an  even  number  of 
rows,  in  most  cases,  so  only  unpaired  partitions  must  be  considered  in  the  sum.] 


Fig.  2.  Franklin’s  correspondence  between  partitions  with  distinct  parts. 


Note:  As  a consequence,  we  obtain  Euler’s  formula 

(1  — z)(l  — z2)(l  — z3) . . . = 1 — z — z2  + z5  + z7  — z12  — 215  + • • • 

— oo<j<oo 

The  generating  function  for  ordinary  partitions  (whose  parts  are  not  necessarily  dis- 
tinct) is  ^2,p(n)zn  — 1/(1  — z)(  1 — 22)(1  — z3) . . . ; hence  we  obtain  a nonobvious 
recurrence  relation  for  the  partition  numbers, 

p(n)  = p{n  - 1)  + p(n  - 2)  - p(n  - 5)  - p(n  - 7)  + p(n  - 12)  + p(n  - 15) . 
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15.  [MSS]  Prove  that  (16)  is  the  generating  function  for  partitions  into  at  most  n 
parts;  that  is,  prove  that  the  coefficient  of  zm  in  l/(l-z)(l-z2)...(l-  zn)  is  the 
number  of  ways  to  write  m = pi  + p2  + • • • + p„  with  pi  > p2  > • • • > p„  > 0. 
[Hint:  Drawing  dots  as  in  exercise  14,  show  that  there  is  a one-to-one  correspondence 
between  n-tuples  (pi . p2 .....  pn ) such  that  Pi  > p2  > ■ ■ • > pn  > 0 and  sequences 
(Pi,  P2,  P3,  ■ ■ ■)  such  that  n > Pi  > P2  > P3  >■■■  > 0,  with  the  property  that 

Pi  + P2  H \-Pn  = P\  + P2  + Ps-\ . In  other  words,  partitions  into  at  most  n parts 

correspond  to  partitions  into  parts  not  exceeding  n.] 

16.  [M25]  (L.  Euler.)  Prove  the  following  identities  by  interpreting  both  sides  of  the 
equations  in  terms  of  partitions: 

TT I = l 

t>L0  (!  - qkz)  (!-*)(!  -qz)(l-q2z)... 


1 + — - — + u ... 

1-q  (l-q)(l-q2) 


n> 0 / k=l 


n (!  + qkz)  = (1  + z)(l  + qz)(  1 + qz) . . . 

fc>0 

= 1+  — h ^ + ... 

1-q  (l-q)(l-q2)  + 


17.  [20]  In  MacMahon’s  correspondence  defined  at  the  end  of  this  section,  what  are 
the  24  quadruples  (91, 92,  q3,  qt)  for  which  (p\,P2,Pz,Pi)  = (0,0, 0,0)? 

18.  [M30]  (T.  Hibbard,  CACM  6 (1963),  210.)  Let  n > 0,  and  assume  that  a sequence 
of  2n  n-bit  integers  Xq,  . . . , X2n~i  has  been  generated  at  random,  where  each  bit  of 
each  number  is  independently  equal  to  1 with  probability  p.  Consider  the  sequence 
Xo  ® 0,  Xi  © 1,  . . . , X2n-i  © (2n  — 1),  where  © denotes  the  “exclusive  or”  operation 

on  the  binary  representations.  Thus  if  p = 0,  the  sequence  is  0, 1, , 2”  — 1,  and  if 

P — 1 if  is  2"  — 1, . . . , 1, 0;  and  when  p = | , each  element  of  the  sequence  is  a random 
integer  between  0 and  2n  — 1.  For  general  p this  is  a useful  way  to  generate  a sequence 
of  random  integers  with  a biased  number  of  inversions,  although  the  distribution  of 
the  elements  of  the  sequence  taken  as  a whole  is  uniform  in  the  sense  that  each  n-bit 
integer  has  the  same  distribution.  What  is  the  average  number  of  inversions  in  such  a 
sequence,  as  a function  of  the  probability  p? 

19.  [M28]  (C.  Meyer.)  When  m is  relatively  prime  to  n,  we  know  that  the  sequence 
(m  mod  n)(2m  mod  n) . . . ((n  — l)m  mod  n)  is  a permutation  of  (1, 2, . . . , n — 1}.  Show 
that  the  number  of  inversions  of  this  permutation  can  be  expressed  in  terms  of  Dedekind 
sums  (see  Section  3.3.3). 

20.  [M43]  The  following  famous  identity  due  to  Jacobi  [ Fundaments  Nova  Theorise 
Functionum  Ellipticarum  (1829),  §64]  is  the  basis  of  many  remarkable  relationships 
involving  elliptic  functions: 


H(1  - «V_1)(1  - u*-V)(l  - ukvk) 
k>1 

— (1  — u)(l  — u)(l  — uv)(  1 — u v)(l  — uv2)(l  — u2v2) . . . 
= 1 — {u  + v)  + ( u3v  + uv3)  — (u6  v3  + u3v6)  + • ■ • 

- E (-i)^U^1). 

— oo<j<+oo 
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For  example,  if  we  set  u — z,  v = z2 , we  obtain  Euler’s  formula  of  exercise  14.  If  we 
set  z = \/u/v,  q = y/uv,  we  obtain 

n(l-g2fc"1^)(l-q2A;-1^1)(l-q2't)=  £ (-1  Tznqn\ 

k>l  — oo<n<oo 

Is  there  a combinatorial  proof  of  Jacobi’s  identity,  analogous  to  Franklin’s  proof 
of  the  special  case  in  exercise  14?  (Thus  we  want  to  consider  “complex  partitions” 

m + ni  = (pi  + q\i)  + (p2  + 92*)  H 1-  {pk  + qki ) 

where  the  pj  + qji  are  distinct  nonzero  complex  numbers,  pj  and  Qj  being  nonnegative 
integers  with  \pj  — qj  \ < 1.  Jacobi’s  identity  says  that  the  number  of  such  represen- 
tations with  k even  is  the  same  as  the  number  with  k odd,  except  when  m and  n 
are  consecutive  triangular  numbers.)  What  other  remarkable  properties  do  complex 
partitions  have? 

► 21.  [ M25 ] (G.  D.  Knott.)  Show  that  the  permutation  a\...a„  is  obtainable  with 
a stack,  in  the  sense  of  exercise  2.2. 1-5  or  2.3. 1-6,  if  and  only  if  Cj  < Cj+i  + 1 for 
1 < j < n in  the  notation  of  exercise  7. 

22.  [ M26 ] Given  a permutation  ai  o2  . . . an  of  {1,2,...,  n},  let  hj  be  the  number  of 
indices  i < j such  that  at  6 {aj  + 1,  a.j  + 2, . . . , aJ+i }.  (If  a]+\  < aj , the  elements  of  this 
set  “wrap  around”  from  ntol.  When  j = n we  use  the  set  {a„+l,  a„+2, . . . , n}.)  For 
example,  the  permutation  591826473  leads  to  hi  ...  hg  = 0 0 1 2 1 4 2 4 6. 

a)  Prove  that  ai  o2  . . . an  can  be  reconstructed  from  the  numbers  hi  h2  . . . hn. 

b)  Prove  that  hi  + h2  + • • • + hn  is  the  index  of  oi  a2  . . . a„. 

► 23.  [ M27 ] ( Russian  roulette.)  A group  of  n condemned  men  who  prefer  probability 
theory  to  number  theory  might  choose  to  commit  suicide  by  sitting  in  a circle  and 
modifying  Josephus’s  method  (exercise  2)  as  follows:  The  first  prisoner  holds  a gun 
and  aims  it  at  his  head;  with  probability  p he  dies  and  leaves  the  circle.  Then  the 
second  man  takes  the  gun  and  proceeds  in  the  same  way.  Play  continues  cyclically, 
with  constant  probability  p > 0,  until  everyone  is  dead. 

Let  a,j  = k if  man  k is  the  jth  to  die.  Prove  that  the  death  order  ai  o2  . . .a„ 
occurs  with  a probability  that  is  a function  only  of  n,  p,  and  the  index  of  the  dual 
permutation  (n  + 1 — an) . . . (n  + 1 — a2)  (n  + 1 — ai).  What  death  order  is  least  likely? 

24.  [M26]  Given  integers  f(l)  t(2) . . . t(n)  with  t(j)  > j,  the  generalized  index  of  a 
permutation  ai  a2  . . . an  is  the  sum  of  all  subscripts  j such  that  aj  > t(aj+ 1),  plus  the 
total  number  of  inversions  such  that  i < j and  t(aj)  > Oj  > aj.  Thus  when  t(j)  = j for 
all  j,  the  generalized  index  is  the  same  as  the  index;  but  when  t(j)  > n for  all  j it  is  the 
number  of  inversions.  Prove  that  the  number  of  permutations  whose  generalized  index 
equals  k is  the  same  as  the  number  of  permutations  having  k inversions.  [Hint:  Show 
that,  if  we  take  any  permutation  ai  . . . an-i  of  {1, . . . , n — 1}  and  insert  the  number  n 
in  all  possible  places,  we  increase  the  generalized  index  by  the  numbers  {0, 1, . . . , n — 1} 
in  some  order.] 

► 25.  [M30]  (Foata  and  Schiitzenberger.)  If  a = ai  . . .an  is  a permutation,  let  ind(a) 
be  its  index,  and  let  inv(a)  count  its  inversions. 

a)  Define  a one-to-one  correspondence  that  takes  each  permutation  a of  {1, . . . ,n} 
to  a permutation  /(a)  that  has  the  following  two  properties:  (i)  ind(/(a))  = 
inv(a);  (ii)  for  1 < j < n,  the  number  j appears  to  the  left  of  j + 1 in  f(a) 
if  and  only  if  it  appears  to  the  left  of  j + 1 in  a.  What  permutation  does  your 
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construction  assign  to  f(a)  when  a = 19826374  5?  For  what  permutation  a is 
f(a)  = 198263745?  [Hint:  If  n > 1,  write  a = xiaix2a2  ...XkOtkan,  where 
Xi,  . . . , xk  are  all  the  elements  < an  if  a\  < an,  otherwise  x\,  . . . , Xk  are  all  the 
elements  > an;  the  other  elements  appear  in  (possibly  empty)  strings  ai,  . . . , afc. 
Compare  the  number  of  inversions  of  h(a)  = axxia2X2  ■ . ■ ctkXk  to  inv(a);  in  this 
construction  the  number  a„  does  not  appear  in  h(a).] 
b)  Use  / to  define  another  one-to-one  correspondence  g having  the  following  two 
properties:  (i)  ind(g(a))  = inv(a);  (ii)  inv(g(a))  = ind(a).  [Hint:  Consider 
inverse  permutations.] 

26.  [M25]  What  is  the  statistical  correlation  coefficient  between  the  number  of  inver- 
sions and  the  index  of  a random  permutation?  (See  Eq.  3.3.2-(24).) 

27.  [ M37 ] Prove  that,  in  addition  to  (15),  there  is  a simple  relationship  between 
inv(oi  02  . . . an)  and  the  n-tuple  (91, 92,  • • • , 9n)-  Use  this  fact  to  generalize  the  deriva- 
tion of  (17),  obtaining  an  algebraic  characterization  of  the  bivariate  generating  function 

H„(w,z)  = J2winV{ai  “2  •an);zind(a1  a2...an) , 

where  the  sum  is  over  all  n!  permutations  ax  a2  ■ ■ ■ an- 

► 28.  [25]  If  aia2...a„  is  a permutation  of  {1,2,  ...,n},  its  total  displacement  is 
defined  to  be  1 aJ  ~ j\-  Find  upper  and  lower  bounds  for  total  displacement 
in  terms  of  the  number  of  inversions. 

29.  [28]  If  7r  = a\a2...  a„  and  n'  = a[a  2 . . . a'n  are  permutations  of  {1,2,...,  n}, 
their  product  7T7r'  is  a'ai  a'„2  . . . a'an . Let  inv(7r)  denote  the  number  of  inversions,  as  in 
exercise  25.  Show  that  inv(7T7r')  < inv(7r)  -t-inv(Tr'),  and  that  equality  holds  if  and  only 
if  7T7r'  is  “below”  k'  in  the  sense  of  exercise  12. 

*5.1.2.  Permutations  of  a Multiset 

So  far  we  have  been  discussing  permutations  of  a set  of  elements;  this  is  just  a 
special  case  of  the  concept  of  permutations  of  a multiset.  (A  multiset  is  like  a set 
except  that  it  can  have  repetitions  of  identical  elements.  Some  basic  properties 
of  multisets  have  been  discussed  in  exercise  4.6.3-19.) 

For  example,  consider  the  multiset 

M = {a,  a,  a,  b , b,  c,  d,  d,  d,  d},  (1) 

which  contains  3 a’s,  2 b's,  1 c,  and  4 d’s.  We  may  also  indicate  the  multiplicities 
of  elements  in  another  way,  namely 

M = {3  • a,  2 • b,  c,  4 • d}.  (2) 

A permutation*  of  M is  an  arrangement  of  its  elements  into  a row;  for  example, 

cabddabdad. 

From  another  point  of  view  we  would  call  this  a string  of  letters,  containing  3 a’s, 
2 b's,  1 c,  and  4 d’s. 

How  many  permutations  of  M are  possible?  If  we  regarded  the  elements 
of  M as  distinct,  by  subscripting  them  ax,  a2,  a3,  bx,  b2,  ci,  dx,  d2,  d3,  d4, 
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we  would  have  10!  = 3,628,800  permutations;  but  many  of  those  permutations 
would  actually  be  the  same  when  we  removed  the  subscripts.  In  fact,  each 
permutation  of  M would  occur  exactly  3!  2!  1!  4!  = 288  times,  since  we  can  start 
with  any  permutation  of  M and  put  subscripts  on  the  a’s  in  3!  ways,  on  the 
fe’s  (independently)  in  2!  ways,  on  the  c in  1 way,  and  on  the  d's  in  4!  ways. 
Therefore  the  true  number  of  permutations  of  M is 


10! 

3!  2!  1!  4! 


12,600. 


In  general,  we  can  see  by  this  same  argument  that  the  number  of  permutations 
of  any  multiset  is  the  multinomial  coefficient 


= ^T7.’  (3> 

where  nx  is  the  number  of  elements  of  one  kind,  n2  is  the  number  of  another 
kind,  etc.,  and  n = n i + n2  + • • • is  the  total  number  of  elements. 

The  number  of  permutations  of  a set  has  been  known  for  more  than  1500 
years.  The  Hebrew  Book  of  Creation  (c.  A.D.  400),  which  was  the  earliest  literary 
product  of  Jewish  philosophical  mysticism,  gives  the  correct  values  of  the  first 
seven  factorials,  after  which  it  says  “Go  on  and  compute  what  the  mouth  cannot 
express  and  the  ear  cannot  hear.”  [Sefer  Yetzirah,  end  of  Chapter  4.  See  Solomon 
Gandz,  Studies  in  Hebrew  Astronomy  and  Mathematics  (New  York:  Ktav,  1970), 
494-496;  Aryeh  Kaplan,  Sefer  Yetzirah  (York  Beach,  Maine:  Samuel  Weiser, 
1993).]  This  is  one  of  the  first  two  known  enumerations  of  permutations  in 
history.  The  other  occurs  in  the  Indian  classic  Anuyogadvarasutra  (c.  500),  rule 
97,  which  gives  the  formula 


6x5x4x3x2xl-2 

for  the  number  of  permutations  of  six  elements  that  are  neither  in  ascending  nor 
descending  order.  [See  G.  Chakravarti,  Bull.  Calcutta  Math.  Soc.  24  (1932), 
79-88.  The  Anuyogadvarasutra  is  one  of  the  books  in  the  canon  of  Jainism, 
a religious  sect  that  flourishes  in  India.] 

The  corresponding  formula  for  permutations  of  multisets  seems  to  have 
appeared  first  in  the  Lilavati  of  Bhaskara  (c.  1150),  sections  270-271.  Bhaskara 
stated  the  rule  rather  tersely,  and  illustrated  it  only  with  two  simple  examples 
{2, 2, 1, 1}  and  {4, 8, 5,  5, 5}.  Consequently  the  English  translations  of  his  work 
do  not  all  state  the  rule  correctly,  although  there  is  little  doubt  that  Bhaskara 
knew  what  he  was  talking  about.  He  went  on  to  give  the  interesting  formula 

(4  + 8 + 5 + 5 + 5)  x 120  x 11111 
5x6 

for  the  sum  of  the  20  numbers  48555  + 45855  + • • • . 

The  correct  rule  for  counting  permutations  when  elements  are  repeated  was 
apparently  unknown  in  Europe  until  Marin  Mersenne  stated  it  without  proof 
as  Proposition  10  in  his  elaborate  treatise  on  melodic  principles  [ Harmonie 
Universelle  2,  also  entitled  Traitez  de  la  Voix  et  des  Chants  (1636),  129-130]. 
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Mersenne  was  interested  in  the  number  of  tunes  that  could  be  made  from  a given 
collection  of  notes;  he  observed,  for  example,  that  a theme  by  Boesset, 


can  be  rearranged  in  exactly  15!/(4!  3!  3!  2!)  = 756,756,000  ways. 

The  general  rule  (3)  also  appeared  in  Jean  Prestet’s  Elemens  des  Mathema- 
tiques  (Paris:  1675),  351-352,  one  of  the  very  first  expositions  of  combinatorial 
mathematics  to  be  written  in  the  Western  world.  Prestet  stated  the  rule  correctly 
for  a general  multiset,  but  illustrated  it  only  in  the  simple  case  {a,  a,  6,  b,  c,  c}. 
A few  years  later,  John  Wallis’s  Discourse  of  Combinations  (Oxford:  1685), 
Chapter  2 (published  with  his  Treatise  of  Algebra ) gave  a clearer  and  somewhat 
more  detailed  discussion  of  the  rule. 

In  1965,  Dominique  Foata  introduced  an  ingenious  idea  called  the  “inter- 
calation product,”  which  makes  it  possible  to  extend  many  of  the  known  results 
about  ordinary  permutations  to  the  general  case  of  multiset  permutations.  [See 
Publ.  Inst.  Statistique,  Univ.  Paris,  14  (1965),  81-241;  also  Lecture  Notes  in 
Math.  85  (Springer,  1969).]  Assuming  that  the  elements  of  a multiset  have  been 
linearly  ordered  in  some  way,  we  may  consider  a two-line  notation  such  as 

( a a a b b c d d d d\ 

\c  a b d d a b d a d J 7 ^) 

where  the  top  line  contains  the  elements  of  M sorted  into  nondecreasing  order 
and  the  bottom  line  is  the  permutation  itself.  The  intercalation  product  aj/3  of 
two  multiset  permutations  a and  j3  is  obtained  by  (a)  expressing  a and  (I  in  the 
two-line  notation,  (b)  juxtaposing  these  two-line  representations,  and  (c)  sorting 
the  columns  into  nondecreasing  order  of  the  top  line.  The  sorting  is  supposed 
to  be  stable,  in  the  sense  that  left-to-right  order  of  elements  in  the  bottom  line 
is  preserved  when  the  corresponding  top  line  elements  are  equal.  For  example, 
c a d a b j bddad  = cabddabdad , since 

fa  a b c d\  f a b d d d\  _faaabbcddd  d\ 
\cadab)J\bddad)~\cabddabdadJ 

It  is  easy  to  see  that  the  intercalation  product  is  associative: 

(a  T /?)  T 7 = a T (/?  T 7); 
it  also  satisfies  two  cancellation  laws: 

TTja  = TTj/3  implies  a = /?, 
a j 7r  = fi  x 7r  implies  a = /?. 

There  is  an  identity  element, 

aje  = eja  — a, 


(6) 

(7) 

(8) 
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where  e is  the  null  permutation,  the  “arrangement”  of  the  empty  set.  Although 
the  commutative  law  is  not  valid  in  general  (see  exercise  2),  we  do  have 

aj/3  = /3ja  if  a and  (i  have  no  letters  in  common.  (9) 

In  an  analogous  fashion  we  can  extend  the  concept  of  cycles  in  permutations 
to  cases  where  elements  are  repeated;  we  let 


(xi  x2  ...  xn)  (10) 

stand  for  the  permutation  obtained  in  two-line  form  by  sorting  the  columns  of 

fx\  x2  ...  xn  \ 

\ x2  x3  ...  X\  ) 

by  their  top  elements  in  a stable  manner.  For  example,  we  have 

d b d d a c a a b d\  faaabbcdddd ' 


(11) 


( dbddacaabd)  = 


bddacaabdd 


cabddabdad 


so  the  permutation  (4)  is  actually  a cycle.  We  might  render  this  cycle  in  words 
by  saying  something  like  “d  goes  to  b goes  to  d goes  to  d goes  . . . goes  to  d 
goes  back.”  Note  that  these  general  cycles  do  not  share  all  of  the  properties  of 
ordinary  cycles;  (aq  x2  ... xn)  is  not  always  the  same  as  (x2  . . ,xnx\). 

We  observed  in  Section  1.3.3  that  every  permutation  of  a set  has  a unique 
representation  (up  to  order)  as  a product  of  disjoint  cycles,  where  the  “product” 
of  permutations  is  defined  by  a law  of  composition.  It  is  easy  to  see  that 
the  product  of  disjoint  cycles  is  exactly  the  same  as  their  intercalation ; this 
suggests  that  we  might  be  able  to  generalize  the  previous  results,  obtaining  a 
unique  representation  (in  some  sense)  for  any  permutation  of  a multiset,  as  the 
intercalation  of  cycles.  In  fact  there  are  at  least  two  natural  ways  to  do  this, 
each  of  which  has  important  applications. 

Equation  (5)  shows  one  way  to  factor  cabddabdad  as  the  intercala- 
tion of  shorter  permutations;  let  us  consider  the  general  problem  of  finding  all 
factorizations  ir  = a j (3  of  a given  permutation  7r.  It  will  be  helpful  to  consider 
a particular  permutation,  such  as 

(a  a b b b b b c c c d d d d d\  , . 

dbcbcacdaddbbbd)'  ^12' 

as  we  investigate  the  factorization  problem. 

If  we  can  write  this  permutation  it  in  the  form  a j /3,  where  a contains  the 
letter  a at  least  once,  then  the  leftmost  a in  the  top  line  of  the  two-line  notation 
for  a must  appear  over  the  letter  d,  so  a must  also  contain  at  least  one  occurrence 
of  the  letter  d.  If  we  now  look  at  the  leftmost  d in  the  top  line  of  a,  we  see  in 
the  same  way  that  it  must  appear  over  the  letter  d,  so  a must  contain  at  least 
two  d’ s.  Looking  at  the  second  d,  we  see  that  a also  contains  at  least  one  b.  We 
have  deduced  the  partial  result 


(a  b d d 

d d b 


(13) 
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on  the  sole  assumption  that  a is  a left  factor  of  7 r containing  the  letter  a. 
Proceeding  in  the  same  manner,  we  find  that  the  b in  the  top  line  of  (13)  must 
appear  over  the  letter  c,  etc.  Eventually  this  process  will  reach  the  letter  a again, 
and  we  can  identify  this  a with  the  first  a if  we  choose  to  do  so.  The  argument 
we  have  just  made  essentially  proves  that  any  left  factor  a of  (12)  that  contains 
the  letter  a has  the  form  (d  d b c d b b c a)  j a',  for  some  permutation  a'.  (It 
is  convenient  to  write  the  a last  in  the  cycle,  instead  of  first;  this  arrangement 
is  permissible  since  there  is  only  one  a.)  Similarly,  if  we  had  assumed  that  a 
contains  the  letter  b,  we  would  have  deduced  that  a = (c  d d b)ja"  for  some  a". 

In  general,  this  argument  shows  that,  if  we  have  any  factorization  aj/3  = n, 
where  a contains  a given  letter  y,  exactly  one  cycle  of  the  form 

(xi  ...  xn  y),  n > 0,  xi, . . . , xn  ^ y,  (14) 

is  a left  factor  of  a.  This  cycle  is  easily  determined  when  7r  and  y are  given;  it  is 
the  shortest  left  factor  of  7r  that  contains  the  letter  y.  One  of  the  consequences 
of  this  observation  is  the  following  theorem: 

Theorem  A.  Let  the  elements  of  the  multiset  M be  linearly  ordered  by  the 
relation  < . Every  permutation  n of  M has  a unique  representation  as  the 
intercalation 

TT  = (xil...Xlniyi)j(x2l...X2n2y2)T--r(xtl...Xtntyt),  t > 0,  (15) 

where  the  following  two  conditions  are  satisfied: 

yi  < y2  < ■ ■ ■ < yt  and  y{  < Xij  for  1 < j < rq,  1 < i < t.  (16) 

(In  other  words,  the  last  element  in  each  cycle  is  smaller  than  every  other  element, 
and  the  sequence  of  last  elements  is  in  nondecreasing  order.) 

Proof.  If  7T  = e,  we  obtain  such  a factorization  by  letting  t = 0.  Otherwise 
we  let  yi  be  the  smallest  element  permuted;  and  we  determine  (in  . . . xlniyi), 
the  shortest  left  factor  of  ix  containing  2/1 , as  in  the  example  above.  Now  7r  = 

fan  • • ■ xir»i  2/i)  TP  for  some  permutation  p:  by  induction  on  the  length,  we  can 
write 

P = (^21  • • • X2n2  2/2)  T • • • T (xti  . . . xtnt  2 /(),  t > 1, 

where  (16)  is  satisfied.  This  proves  the  existence  of  such  a factorization. 

Conversely,  to  prove  that  the  representation  (15)  satisfying  (16)  is  unique, 
clearly  t = 0 if  and  only  if  7T  is  the  null  permutation  e.  When  t > 0,  (16) 
implies  that  2/1  is  the  smallest  element  permuted,  and  that  (xn  . . . xlni  yx)  is 
the  shortest  left  factor  containing  2/1  • Therefore  (xu  ...  xlni  yx)  is  uniquely 
determined,  by  the  cancellation  law  (7)  and  induction,  the  representation  is 
unique.  | 

For  example,  the  canonical”  factorization  of  (12),  satisfying  the  given  con- 
ditions, is 

(d  d b c d b b c a)j(b  a)T(c  d b)j(d),  (i7) 


if  a < b < c < d. 


5.1.2 


PERMUTATIONS  OF  A MULTISET 


27 


It  is  important  to  note  that  we  can  actually  drop  the  parentheses  and  the 
t’s  in  this  representation,  without  ambiguity!  Each  cycle  ends  just  after  the  first 
appearance  of  the  smallest  remaining  element.  So  this  construction  associates 
the  permutation 

n'  = ddbcdbbcabacdbd 
with  the  original  permutation 

Tr  = dbcbcacdaddbbbd. 

Whenever  the  two-line  representation  of  7r  had  a column  of  the  form  vx , where 
x < y,  the  associated  permutation  7 r'  has  a corresponding  pair  of  adjacent 
elements  . . . y x . . . . Thus  our  example  permutation  7r  has  three  columns  of  the 
form  f , and  7r'  has  three  occurrences  of  the  pair  d b.  In  general  this  construction 
establishes  the  following  remarkable  theorem: 

Theorem  B.  Let  M be  a multiset.  There  is  a one-to-one  correspondence 
between  the  permutations  of  M such  that,  if  n corresponds  to  7r',  the  following 
conditions  hold: 

a)  The  leftmost  element  of  7r'  equals  the  leftmost  element  of  7r. 

b)  For  all  pairs  of  permuted  elements  (x,  y)  with  x < y,  the  number  of  occur- 

rences of  the  column  % in  the  two-line  notation  of  tt  is  equal  to  the  number  of 
times  x is  immediately  preceded  by  y in  7 r'.  | 

When  M is  a set,  this  is  essentially  the  same  as  the  “unusual  correspondence” 
we  discussed  near  the  end  of  Section  1.3.3,  with  unimportant  changes.  The  more 
general  result  in  Theorem  B is  quite  useful  for  enumerating  special  kinds  of 
permutations,  since  we  can  often  solve  a problem  based  on  a two-line  constraint 
more  easily  than  the  equivalent  problem  based  on  an  adjacent-pair  constraint. 

P.  A.  MacMahon  considered  problems  of  this  type  in  his  extraordinary 
book  Combinatory  Analysis  1 (Cambridge  Univ.  Press,  1915),  168-186.  He 
gave  a constructive  proof  of  Theorem  B in  the  special  case  that  M contains 
only  two  different  kinds  of  elements,  say  a and  b ; his  construction  for  this 
case  is  essentially  the  same  as  that  given  here,  although  he  expressed  it  quite 
differently.  For  the  case  of  three  different  elements  a,  b,  c,  MacMahon  gave 
a complicated  nonconstructive  proof  of  Theorem  B;  the  general  case  was  first 
proved  constructively  by  Foata  [Comptes  Rendus  Acad.  Sci.  258  (Paris,  1964), 
1672-1675]. 

As  a nontrivial  example  of  Theorem  B,  let  us  find  the  number  of  strings  of 
letters  a,  b,  c containing  exactly 

A occurrences  of  the  letter  a; 

B occurrences  of  the  letter  b; 

C occurrences  of  the  letter  c; 
k occurrences  of  the  adjacent  pair  of  letters  ca; 
l occurrences  of  the  adjacent  pair  of  letters  cb ; 
m occurrences  of  the  adjacent  pair  of  letters  ba.  (18) 
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The  theorem  tells  us  that  this  is  the  same  as  the  number  of  two-line  arrays  of 
the  form 

A B r. 


a ...  a b 


A—k—m  a’s  m a’s 


B—l  6’s 


b c 


The  a’s  can  be  placed  in  the  second  line  in 


A 

A-k- 


then  the  6’s  can  be  placed  in  the  remaining  positions  ir 


B + k\  / C-k 
B — l ) { l 


The  positions  that  are  still  vacant  must  be  filled  by  c’s;  hence  the  desired  number 


A — k — m ) \ m 


B + k\  / C-k 
B — l ) [ l 


Let  us  return  to  the  question  of  finding  all  factorizations  of  a given  per- 
mutation. Is  there  such  a thing  as  a “prime”  permutation,  one  that  has  no 
intercalation  factors  except  itself  and  e?  The  discussion  preceding  Theorem  A 
leads  us  quickly  to  conclude  that  a permutation  is  prime  if  and  only  if  it  is  a 
cycle  with  no  repeated  elements.  For  if  it  is  such  a cycle,  our  argument  proves 
that  there  are  no  left  factors  except  e and  the  cycle  itself.  And  if  a permutation 
contains  a repeated  element  y,  it  has  a nontrivial  cyclic  left  factor  in  which  y 
appears  only  once. 

A nonprime  permutation  can  be  factored  into  smaller  and  smaller  pieces 
until  it  has  been  expressed  as  a product  of  primes.  Furthermore  we  can  show 
that  the  factorization  is  unique,  if  we  neglect  the  order  of  factors  that  commute: 

Theorem  C.  Every  permutation  of  a multiset  can  be  written  as  a product 

T<72  t • " T<rt,  t > 0,  (21) 

where  each  (jj  is  a cycle  having  no  repeated  elements.  This  representation  is 
unique,  in  the  sense  that  any  two  such  representations  of  the  same  permuta- 
tion may  be  transformed  into  each  other  by  successively  interchanging  pairs  of 
adjacent  disjoint  cycles. 
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The  term  “disjoint  cycles”  means  cycles  having  no  elements  in  common.  As 
an  example  of  this  theorem,  we  can  verify  that  the  permutation 

{ a a b b c c d\ 

\b  a a c d b c J 

has  exactly  five  factorizations  into  primes,  namely 

(a  b ) T (a)  t (c  d)  T ( b c)  = (a  b)  T (c  d)  j (a)  j ( b c ) 

= (ab)  T (c  d)  r (6  c)  T (a) 

= (cd)  T (a  6)  t (6  c)  T (a) 

= M)t  (a  6)  T (a)  t (b  c).  (22) 

Proof.  We  must  show  that  the  stated  uniqueness  property  holds.  By  induction 
on  the  length  of  the  permutation,  it  suffices  to  prove  that  if  p and  a are  unequal 
cycles  having  no  repeated  elements,  and  if 

pja  = aj/3, 

then  p and  cr  are  disjoint,  and 

a = <7T0,  (3  = pj9, 

for  some  permutation  6. 

If  y is  any  element  of  the  cycle  p,  then  any  left  factor  of  a j (3  containing  the 
element  y must  have  p as  a left  factor.  So  if  p and  <7  have  an  element  in  common, 
cr  is  a multiple  of  p\  hence  a = p (since  they  are  primes),  contradicting  our  as- 
sumption. Therefore  the  cycle  containing  y,  having  no  elements  in  common  with 
cr,  must  be  a left  factor  of  ft.  The  proof  is  completed  by  using  the  cancellation 
law  (7).  | 

As  an  example  of  Theorem  C,  let  us  consider  permutations  of  the  multiset 
M = {A  ■ a,  B ■ b,  C ■ c}  consisting  of  A a’s,  B b' s,  and  C c’s.  Let  N(A,  B , C,  m) 
be  the  number  of  permutations  of  M whose  two-line  representation  contains  no 
columns  of  the  forms  “ , cc,  and  exactly  m columns  of  the  form  % . It  follows 
that  there  are  exactly  A — m columns  of  the  form  “,  B — m of  the  form  l , 
C - B + m of  the  form  ca,  C — A + m of  the  form  bc , and  A + B — C - m of  the 
form  b.  Hence 

J v(w.»)-(^)(c_f  + m)(B®m).  (=3) 

Theorem  C tells  us  that  we  can  count  these  permutations  in  another  way: 
Since  columns  of  the  form  “ , 5 , cc  are  excluded,  the  only  possible  prime  factors 
of  the  permutation  are 

(a  b),  (a  c ),  (b  c),  ( a b c),  (a  c b).  (24) 

Each  pair  of  these  cycles  has  at  least  one  letter  in  common,  so  the  factorization 
into  primes  is  completely  unique.  If  the  cycle  (a  b c)  occurs  k times  in  the 
factorization,  our  previous  assumptions  imply  that  (a  b)  occurs  m — k times, 
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C b c)  occurs  C - A + m - k times,  (a  c)  occurs  C — B + m - k times,  and 
(a  c b ) occurs  A + B — C — 2m  + k times.  Hence  N(A,  B , C,  m)  is  the  number 
of  permutations  of  these  cycles  (a  multinomial  coefficient),  summed  over  k: 

N(A,B,C,  m) 

_ ■> (C  + m — k)\ 

fe  ( m-k)\(C-A  + m-k)\(C-B  + m-k)\k\(A  + B-C~2m  + k)\ 


?(!)(£)( 


A — m 

C — B + m — k 


) rr 


(25) 


Comparing  this  with  (23),  we  find  that  the  following  identity  must  be  valid: 

?C) (cAll. k) (C+Tk) - (c-ba+ J (b- J-  (-) 


This  turns  out  to  be  the  identity  we  met  in  exercise  1.2.6-31,  namely 
+ / N + R - S\  / f?  + j \ _ \ / 5 \ 

y'  3 A IV-j  Am  + _ \m)  \n)’ 


(27) 


with  M = A + B-C-m,  N = C-B+m,  R = B,S  = C,  and  j = C-B+m-k. 

Similarly  we  can  count  the  number  of  permutations  of  {A- a,  B b,  C c,  D d) 
such  that  the  number  of  columns  of  various  types  is  specified  as  follows: 

Column  a a b b c c d d 

type:  d b a c b da  0 

Frequency:  r A-r  q B-q  B-A  + r D-r  A-q  D-A  + q 

(Here  A + C = B + D.)  The  possible  cycles  occurring  in  a prime  factorization 
of  such  permutations  are  then 

Cycle:  (a  b)  ( b c ) (c  d)  (da)  (abed)  (d  c b a) 

Frequency:  A-r-s  B-q-s  D-r-s  A-q-s  s q-A  + r + s ' 

for  some  s (see  exercise  12).  In  this  case  the  cycles  (a  b)  and  (c  d)  commute  with 
each  other,  and  so  do  (b  c)  and  (d  a),  so  we  must  count  the  number  of  distinct 
prime  factorizations.  It  turns  out  (see  exercise  10)  that  there  is  always  a unique 
factorization  such  that  no  (c  d)  is  immediately  followed  by  (a  b),  and  no  (d  a)  is 
immediately  followed  by  (b  c).  Hence  by  the  result  of  exercise  13,  we  have 

EZ-Bw  A — q — s \ / B + D — r — s — t\ 

\ t J \A  — r — s — t)  V B-q-s  J 

D\ 

(D  — r — s)!  (A-g-s)!  s!  (g-A  + r + s)! 
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Taking  out  the  factor  from  both  sides  and  simplifying  the  factorials  slightly 
leaves  us  with  the  complicated-looking  five-parameter  identity 
f B\  f A — r — t\  f B + D — r — s — t\  / D — A + q\  f A — q \ 

~\t/\  s /\  D + q — r — t ) \ D — r — s ) \r  + t — q) 

-(AXT/)0  <*» 

The  sum  on  s can  be  performed  using  (27),  and  the  resulting  sum  on  t is  easily 
evaluated;  so,  after  all  this  work,  we  were  not  fortunate  enough  to  discover  any 
identities  that  we  didn’t  already  know  how  to  derive.  But  at  least  we  have 
learned  how  to  count  certain  kinds  of  permutations,  in  two  different  ways,  and 
these  counting  techniques  are  good  training  for  the  problems  that  lie  ahead. 

EXERCISES 

1.  [M05]  True  or  false:  Let  Mi  and  M2  be  multisets.  If  a is  a permutation  of  Mi 
and  /3  is  a permutation  of  M2 , then  a j /3  is  a permutation  of  Mi  U M2 . 

2.  [10]  The  intercalation  of  c a d a b and  b d d a d is  computed  in  (5);  find  the 

intercalation  b d d a d j c a d a b that  is  obtained  when  the  factors  are  interchanged. 

3.  [MIS]  Is  the  converse  of  (9)  valid?  In  other  words,  if  a and  B commute  under 
intercalation,  must  they  have  no  letters  in  common? 

4.  [Mil]  The  canonical  factorization  of  (12),  in  the  sense  of  Theorem  A,  is  given 
in  (17)  when  a < b < c < d.  Find  the  corresponding  canonical  factorization  when 
d < c < b < a. 

5.  [M23]  Condition  (b)  of  Theorem  B requires  x < y,  what  would  happen  if  we 
weakened  the  relation  to  x < y? 

6.  [Ml 5]  How  many  strings  are  there  that  contain  exactly  m o’ s,  n b’s,  and  no  other 
letters,  with  exactly  k of  the  a’s  preceded  immediately  by  a 6? 

7.  [M21]  How  many  strings  on  the  letters  a,  b,  c satisfying  conditions  (18)  begin 
with  the  letter  a?  with  the  letter  6?  with  c? 

► 8.  [20]  Find  all  factorizations  of  (12)  into  two  factors  aj/3. 

9.  [S3]  Write  computer  programs  that  perform  the  factorizations  of  a given  multiset 
permutation  into  the  forms  mentioned  in  Theorems  A and  C. 

► 10.  [M30]  True  or  false:  Although  the  factorization  into  primes  isn’t  quite  unique, 
according  to  Theorem  C,  we  can  ensure  uniqueness  in  the  following  way:  “There  is  a 
linear  ordering  A of  the  set  of  primes  such  that  every  permutation  of  a multiset  has  a 
unique  factorization  <7it<72  t • ■ • T&n  into  primes  subject  to  the  condition  that  at  ■<  cri+1 
whenever  <7,  commutes  with  cq+i , for  1 < i < n.” 

► 11.  [M26]  Let  cti  , (72, . ..  ,<rt  be  cycles  without  repeated  elements.  Define  a partial  or- 
dering -<  on  the  t objects  {xi,. . ■ ,xt}  by  saying  that  Xi  -<  Xj  if  i < j and  a,  has  at  least 
one  letter  in  common  with  crj . Prove  the  following  connection  between  Theorem  C and 
the  notion  of  “topological  sorting”  (Section  2.2.3):  The  number  of  distinct  prime  factor- 
izations of  CT1JO2T  • --[crt  is  the  number  of  ways  to  sort  the  given  partial  ordering  topo- 
logically. (For  example,  corresponding  to  (22)  we  find  that  there  are  five  ways  to  sort  the 
ordering  x\  A X2,  X3  -<  X4,  xi  -<  X4  topologically.)  Conversely,  given  any  partial  order- 
ing on  t elements,  there  is  a set  of  cycles  {<7i,  02,  ■ ■ ■ , <7t}  that  defines  it  in  the  stated  way. 
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12.  [ M16 ] Show  that  (29)  is  a consequence  of  the  assumptions  of  (28). 

13.  [M21  ] Prove  that  the  number  of  permutations  of  the  multiset 

{A-  a,  B b,  C c,  D ■ d,  E ■ e,  F • /} 
containing  no  occurrences  of  the  adjacent  pairs  of  letters  ca  and  db  is 

Y' { D \ (A  + B + E + F\  (A  + B + C+E  + F~t\  (C  + D + E + F\ 
t \A  — t)  \ t A B A C,D,E,F  )■ 

14.  [M30]  One  way  to  define  the  inverse  n~  of  a general  permutation  7r,  suggested  by 
other  definitions  in  this  section,  is  to  interchange  the  lines  of  the  two-line  representation 
of  7r  and  then  to  do  a stable  sort  of  the  columns  in  order  to  bring  the  top  row  into 
nondecreasing  order.  For  example,  if  a < b < c < d,  this  definition  implies  that  the 
inverse  of  cabddabdad  is  acdadabbdd. 

Explore  properties  of  this  inversion  operation;  for  example,  does  it  have  any  simple 
relation  with  intercalation  products?  Can  we  count  the  number  of  permutations  such 
that  7 r = 7r~ ? 

► 15.  [M25]  Prove  that  the  permutation  ai  ...an  of  the  multiset 

{nr  • 2Jl,  Tl2  X2 , • . . , Tim  * Em} , 

where  X\  < *2  < • • • < xm  and  ni  + n2  + • • • + nm  = n,  is  a cycle  if  and  only  if  the 
directed  graph  with  vertices  {*i,*2,...,®m}  and  arcs  from  Xj  to  ani+...+nj  contains 
precisely  one  oriented  cycle.  In  the  latter  case,  the  number  of  ways  to  represent  the 
permutation  in  cycle  form  is  the  length  of  the  oriented  cycle.  For  example,  the  directed 
graph  corresponding  to 


faaabbcccdd 
\d  cbacaabdc 


and  the  two  ways  to  represent  the  permutation  as  a cycle  ar  e(baddcacabc)  and 
(c  a d d c a c b a b). 

16.  [M55]  We  found  the  generating  function  for  inversions  of  permutations  in  the 
previous  section,  Eq.  5.1.1-(8),  in  the  special  case  that  a set  was  being  permuted. 
Show  that,  in  general,  if  a multiset  is  permuted,  the  generating  function  for  inversions 
of  {ni  • ®i,  r*2  • *a» . . ■ } is  the  “2-multinomial  coefficient” 


( " 
\ni,n2,. 


n\z 

ni'.z  n2\z-.-' 


771 

where  m\z  = (1  + 2 -I b zk~1). 

k= 1 


[Compare  with  (3)  and  with  the  definition  of  2-nomial  coefficients  in  Eq.  1.2.6-(4o).] 

17.  [M24]  Find  the  average  and  standard  deviation  of  the  number  of  inversions  in 
a random  permutation  of  a given  multiset,  using  the  generating  function  found  in 
exercise  16. 


18.  [M30]  (P.  A.  MacMahon.)  The  index  of  a permutation  <21  a2  . . . an  was  defined 
in  the  previous  section;  and  we  proved  that  the  number  of  permutations  of  a given 
set  that  have  a given  index  k is  the  same  as  the  number  of  permutations  that  have  k 
inversions.  Does  the  same  result  hold  for  permutations  of  a given  multiset? 
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19.  [ HMZ8 ] Define  the  Mobius  function  p(n)  of  a permutation  n to  be  0 if  7r  contains 
repeated  elements,  otherwise  (-l)fc  if  n is  the  product  of  k primes.  (Compare  with  the 
definition  of  the  ordinary  Mobius  function,  exercise  4.5.2-10.) 

a)  Prove  that  if  7r  / e,  we  have 

=0, 

summed  over  all  permutations  A that  are  left  factors  of  n (namely  all  A such  that 
7r  = A j p for  some  p ). 

b)  Given  that  x\  < X2  < ■ • • < xm  and  7r  = xy  Xy  . . . Xj„,  where  1 < ik  < m for 
1 < k < n,  prove  that 


p(n)  = (~l)ne(iii2  . . .in),  where  e(ii  i2  . . . in)  = sign  (ik  - ij). 

l<j<fe<n 

► 20.  [HM33]  (D.  Foata.)  Let  ( ay ) be  any  matrix  of  real  numbers.  In  the  notation  of 
exercise  19(b),  define  v(n)  — a<ui  . . . a,n]n , where  the  two-line  notation  for  7r  is 


This  function  is  useful  in  the  computation  of  generating  functions  for  permutations  of 
a multiset,  because  v(ir),  summed  over  all  permutations  7r  of  the  multiset 

{ni  • x\ , . . . , nm  ' Xm~\ , 

will  be  the  generating  function  for  the  number  of  permutations  satisfying  certain 
restrictions.  For  example,  if  we  take  oy  = z for  i = j,  and  ay  = 1 for  i / j, 
then  "M  is  the  generating  function  for  the  number  of  “fixed  points”  (columns  in 
which  the  top  and  bottom  entries  are  equal).  In  order  to  study  J2  v(n)  for  all  multisets 
simultaneously,  we  consider  the  function 


g=Y1  7rv(7r) 

summed  over  all  n in  the  set  {aq, . . . , xm}*  of  all  permutations  of  multisets  involving 
the  elements  xi,...,xm,  and  we  look  at  the  coefficient  of  s"1  . . . ij,"  in  G. 

In  this  formula  for  G we  are  treating  7r  as  the  product  of  the  x’s.  For  example, 
when  m = 2 we  have 

G = l + Xll'(xi)+X2l/(x2)  + XlXll'(xiXl)+XlX2l'(xiX2)+X2Xll'(x2Xl)+X2X2l'(X2X2)-\ 

= 1 + Xian  -\-X2Cl22  +Xjaji  +XlX2ana22  +XlX2a2iai2  + *2a22  + • • • • 

Thus  the  coefficient  of  x"1 . . . x^r'  in  G is  summed  over  all  permutations  n of 

{ni  • xi , . . . , nm  ■ Xm}.  It  is  not  hard  to  see  that  this  coefficient  is  also  the  coefficient  of 
x" 1 . . . xj,"  in  the  expression 


(anXi  T * * ‘ T OlmXm)  (a 2lXl  T ■ * ■ T 02mXm)  * * * (amlXl  “h  * * * T OjmmXm)  * 


The  purpose  of  this  exercise  is  to  prove  what  P.  A.  MacMahon  called  a “Master 
Theorem”  in  his  Combinatory  Analysis  1 (1915),  Section  3,  namely  the  formula 


G = 1/D, 


where 


D = det 


/ 1 — anxi 

I —021X1 


— 012X2 
1 — a22X2 


Ol  mXm  \ 
m®m 

1 Q'mm%m  ' 


— am2^2 
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For  example,  if  al3  = 1 for  all  i and  j,  this  formula  gives 


G=l/(l-(ii+i2i h xm)), 


and  the  coefficient  of  x" 1 . . . x^™  turns  out  to  be  (ni  + • • • + nm)!/m! . . . nm!,  as  it 
should.  To  prove  the  Master  Theorem,  show  that 

a)  u(tt  t p)  = v(n)  i/(p); 

b)  D = Xv  7r/i(7r)^(7r),  in  the  notation  of  exercise  19,  summed  over  all  permutations 
7r  in  {xi,...,xm}*; 

c)  therefore  D ■ G = 1. 

21.  [M21]  Given  m,  . . . , nm,  and  d > 0,  how  many  permutations  o,\  a2  . . . an  of  the 

multiset  {m  • 1, . . . , nm  • m}  satisfy  aj+1  > aj  - d for  1 < j < n = m 4 h nm? 

22.  [M30]  Let  P(x"1  . . . iJJ" ) denote  the  set  of  all  possible  permutations  of  the  multi- 
set {m  nm-xm},  and  let  P^x^x”1  ...x^m)  be  the  subset  of  P^x"1  ...x"">) 

in  which  the  first  no  elements  are  / xo. 

a)  Given  a number  t with  1 < t < m,  find  a one-to-one  correspondence  between 
P(lni  . . . rn"'m ) and  the  set  of  all  ordered  pairs  of  permutations  that  belong  re- 
spectively to  Po(0fclni  . . . f"‘)  and  Po(0fc(t+l)nt+1  ...  mnm),  for  some  k > 0.  [Hint: 
For  each  jt  = ai . . . on  G P(lni  . . . mn’rl),  let  Z(7r)  be  the  permutation  obtained  by 

replacing  t + 1,  . . . , m by  0 and  erasing  all  Os  in  the  last  nt+ 1 H f-  nrn  positions; 

similarly,  let  r( n)  be  the  permutation  obtained  by  replacing  1,  . . . , t by  0 and 
erasing  all  Os  in  the  first  n\  + 1 - nt  positions.] 

b)  Prove  that  the  number  of  permutations  of  P0  (0n°  lni  . . . mnm ) whose  two-line  form 
has  pj  columns  ° and  qj  columns  J0  is 

]P(xf  ..  -x^y^  ...ylr-*")  1 [PK1  . ..x*ry?-n  . 

|Pq  (0n°  lni  ...  mnrn ) | 

c)  Let  w i,  . . . , wm,  zi,  • ■ • , zm  be  complex  numbers  on  the  unit  circle.  Define  the 
weight  w( 7r)  of  a permutation  n € P(lni  . . . mrlm ) as  the  product  of  the  weights 
of  its  columns  in  two-line  form,  where  the  weight  of  { is  Wj /wk  if  j and  k are 
both  < t or  both  > t,  otherwise  it  is  Zj/zk.  Prove  that  the  sum  of  w(n)  over  all 
7T  € P(lni  . . . mnm)  is 


E 


kl2(n<t  - k)l  (n>t  - A;)! 


nil . . . n„ 


where  n<t  is  m + • • • + n«,  n>t  is  nt+1  -f  • • • + nm,  and  the  inner  sum  is  over  all 
(pi,  • • • ,Pm)  such  that  p<t  = p>t  = k. 

23.  [M23\  A strand  of  DNA  can  be  thought  of  as  a word  on  a four-letter  alphabet. 
Suppose  we  copy  a strand  of  DNA  and  break  it  completely  into  one-letter  bases,  then 
recombine  those  bases  at  random.  If  the  resulting  strand  is  placed  next  to  the  original, 
prove  that  the  number  of  places  in  which  they  differ  is  more  likely  to  be  even  than  odd. 
[Hint:  Apply  the  previous  exercise.] 

24.  [27]  Consider  any  relation  R that  might  hold  between  two  unordered  pairs  of 
letters;  if  {w,x}R{y,z}  we  say  {w,x}  preserves  {y,z},  otherwise  {w,x}  moves  {y,z}. 

The  operation  of  transposing  ” * with  respect  to  R replaces  f by  £ “ or  f , 
according  as  the  pair  {w,x}  preserves  or  moves  the  pair  {y,  z},  assuming  that  w ^ x 
and  y ^ z;  if  w = x or  y = z the  transposition  always  produces  * ™ . 
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The  operation  of  sorting  a two-line  array  (*{  ;;;  *" ) with  respect  to  R repeatedly 
finds  the  largest  Xj  such  that  x3  > xJ+l  and  transposes  columns  j and  j + 1,  until 
eventually  x\  < ■ ■ ■ < xn.  (We  do  not  require  2/1  . . . yn  to  be  a permutation  of  x%  . . . xn.) 

a)  Given  ;;;  *"),  prove  that  for  every  x € {xi, . . . ,xn}  there  is  a unique  y 6 

(Vu-  • ■ ,Vn } such  that  sort(*{  ;;;  *”  ) = sort(**?  ;;;  *?)  for  some  x'2,  y'2,  ■ . . ,x'n,  y'n. 

b)  Let  ;;;  ”£)  ® (*J  ;;;  *')  denote  the  result  of  sorting  (“*  ;;;  ”£  xz\  ;;;  *' ) with 

respect  to  R.  For  example,  if  R is  always  true,  ® sorts  {uq , . . . , Wk , ii , . . . , xi }, 

but  it  simply  juxtaposes  yi . . .yic  with  Z\  . . . zp  if  R is  always  false,  ® is  the  inter- 
calation product  j.  Generalize  Theorem  A by  proving  that  every  permutation  it 
of  a multiset  M has  a unique  representation  of  the  form 

7T  - (in  • • • Xini  2/1 ) ® ((x2l  . . . X2 n2  2/2)  ® ( ■ • • ® (*tl  ■ ■ • Xtnt  Vt)  ' ' ' )) 

satisfying  (16),  if  we  redefine  cycle  notation  by  letting  the  two-line  array  (11) 
correspond  to  the  cycle  (x2  ...  xnxi)  instead  of  to  (aq  x2  ...  xn).  For  example, 
suppose  {w,  x}R{y,  z}  means  that  w,  x,  y,  and  z are  distinct;  then  it  turns  out 
that  the  factorization  of  (12)  analogous  to  (17)  is 

( ddbca ) ® (( cbba ) ® (( cdb ) ® ((db)  ® (d))))  . 

(The  operation  ® does  not  always  obey  the  associative  law;  parentheses  in  the 
generalized  factorization  should  be  nested  from  right  to  left.) 

*5.1.3.  Runs 

In  Chapter  3 we  analyzed  the  lengths  of  upward  runs  in  permutations,  as  a way 
to  test  the  randomness  of  a sequence.  If  we  place  a vertical  line  at  both  ends 
of  a permutation  ai  a2  . ■ . an  and  also  between  cij  and  aJ+1  whenever  a,j  > aJ+i, 
the  runs  are  the  segments  between  pairs  of  lines.  For  example,  the  permutation 

I 3 5 7 | 1 6 8 9 | 4 | 2 | 

has  four  runs.  The  theory  developed  in  Section  3.3.2G  determines  the  average 
number  of  runs  of  length  k in  a random  permutation  of  {1,2,...,  n},  as  well  as 
the  covariance  of  the  numbers  of  runs  of  lengths  j and  k.  Runs  are  important  in 
the  study  of  sorting  algorithms,  because  they  represent  sorted  segments  of  the 
data,  so  we  will  now  take  up  the  subject  of  runs  once  again. 

Let  us  use  the  notation 

O M 

to  stand  for  the  number  of  permutations  of  {1,2,  ...,n}  that  have  exactly  k 
“descents”  a3  > aj+ 1,  thus  exactly  k + 1 ascending  runs.  These  numbers  ({“) 
arise  in  several  contexts,  and  they  are  usually  called  Eulerian  numbers  since 
Euler  discussed  them  in  his  famous  book  Institutiones  Calculi  Differentialis 
(St.  Petersburg:  1755),  485-487,  after  having  introduced  them  several  years 
earlier  in  a technical  paper  [Comment.  Acad.  Sci.  Imp.  Petrop.  8 (1736),  147- 
158,  §13];  they  should  not  be  confused  with  the  Euler  numbers  En  discussed  in 
exercise  5.1.4-23.  The  angle  brackets  in  (£)  remind  us  of  the  “>”  sign  in  the 
definition  of  a descent.  Of  course  (£)  is  also  the  number  of  permutations  that 
have  k “ascents”  a3  < aJ+1 . 
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We  can  use  any  given  permutation  of  {1, . . . , n - 1}  to  form  n new  permuta- 
tions, by  inserting  the  element  n in  all  possible  places.  If  the  original  permutation 
has  k descents,  exactly  A:+  1 of  these  new  permutations  will  have  k descents;  the 
remaining  n — 1 — k will  have  k + 1,  since  we  increase  the  number  of  descents 
unless  we  place  the  element  n at  the  end  of  an  existing  run.  For  example,  the 
six  permutations  formed  from  312  4 5 are 

631245,  361245,  316245, 

312645,  312465,  312456; 

all  but  the  second  and  last  of  these  have  two  descents  instead  of  one.  Therefore 
we  have  the  recurrence  relation 

( A;  ) ~ ^ + k ) + i)>  integer  n > 0,  integer  k.  (2) 

By  convention  we  set 


saying  that  the  null  permutation  has  no  descents.  The  reader  may  find  it 
interesting  to  compare  (2)  with  the  recurrence  relations  for  Stirling  numbers 
in  Eqs.  1.2.6-(46).  Table  1 lists  the  Eulerian  numbers  for  small  n. 

Several  patterns  can  be  observed  in  Table  1.  By  definition,  we  have 


Eq.  (6)  follows  from  (5)  because  of  a general  rule  of  symmetry, 

O^L-l-k)'  for"al’  M 

which  comes  from  the  fact  that  each  nonnull  permutation  ai  a2  . . . an  having 
k descents  has  n — 1 — k ascents. 

Another  important  property  of  the  Eulerian  numbers  is  the  formula 

E/n\  f m + k\ 

\k)\  n )=“  ’ ni°-  <*> 

k 

which  was  discovered  by  the  Chinese  mathematician  Li  Shan-Lan  and  pub- 
lished in  1867.  [See  J.-C.  Martzloff,  A History  of  Chinese  Mathematics  (Berlin: 
Springer,  1997),  346-348;  special  cases  for  n < 5 had  already  been  known  to 
Yoshisuke  Matsunaga  in  Japan,  who  died  in  1744.]  Li  Shan-Lan’s  identity  follows 
from  the  properties  of  sorting:  Consider  the  m”  sequences  ax  0,2  . . . an  such  that 
1 < di  <m.  We  can  sort  any  such  sequence  into  nondecreasing  order  in  a stable 
manner,  obtaining 


^ ai2  — ' ' ‘ ^ ain 


(9) 
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n 

(o> 

0 

C> 

(a) 

<:> 

/n\  /n\  /n\ 

\ 5 / \ 6 / \ 7 / 

/ n \ 

\ 8 / 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

2 

1 

1 

0 

0 

0 

0 

0 

0 

0 

3 

1 

4 

1 

0 

0 

0 

0 

0 

0 

4 

1 

11 

11 

1 

0 

0 

0 

0 

0 

5 

1 

26 

66 

26 

1 

0 

0 

0 

0 

6 

1 

57 

302 

302 

57 

1 

0 

0 

0 

7 

1 

120 

1191 

2416 

1191 

120 

1 

0 

0 

8 

1 

247 

4293 

15619 

15619 

4293 

247 

1 

0 

9 

1 

502 

14608 

88234 

156190 

88234 

14608 

502 

1 

where  ii  i2  . ■ . in  is  a uniquely  determined  permutation  of  {1,2,...,  n}  such  that 
a-ij  = aq+1  implies  ij  < iJ+1;  in  other  words,  ij  > ij+x  implies  that  aq.  < aij+1. 
If  the  permutation  *i  i2  ■ • • in  has  k runs,  we  will  show  that  the  number  of 
corresponding  sequences  aia2...an  is  (m+”“fc) . This  will  prove  (8)  if  we  replace 
k by  n — k and  use  (7),  because  (£)  permutations  have  n — k runs. 

For  example,  if  n = 9 and  ix  i2  . . . in  = 35716894  2,  we  want  to  count  the 
number  of  sequences  ax  a2  ■ . . an  such  that 


1 < 0-3  < as  < «7  < ai  < a6  < as  < ag  < 04  < a2  < m;  (10) 

this  is  the  number  of  sequences  61  b2  ...  69  such  that 


1 < 61  < b2  < b3  < 64  < b5  < b6  < br  < bs  < 69  < m + 5, 


since  we  can  let  6j  = a3,  b2  = a5  + 1,  63  = a7  + 2,  64  = ai  + 2,  b5  — a6  + 3, 
etc.  The  number  of  choices  of  the  b' s is  simply  the  number  of  ways  of  choosing 
9 things  out  of  m + 5,  namely  (m^5) ; a similar  proof  works  for  general  n and  k, 
and  for  any  permutation  i\  i2  . . . in  with  k runs. 

Since  both  sides  of  (8)  are  polynomials  in  m,  we  may  replace  m by  any  real 
number  x,  and  we  obtain  an  interesting  representation  of  powers  in  terms  of 
consecutive  binomial  coefficients: 


xwx:1) 


+ •••  + 


l)(I  + n ')'  "-1'  (1,) 


For  example, 

‘3=GMTMT)' 

This  is  the  key  property  of  Eulerian  numbers  that  makes  them  useful  in  the 
study  of  discrete  mathematics. 

Setting  x = 1 in  (11)  proves  again  that  („"1)  = 1,  since  the  binomial 
coefficients  vanish  in  all  but  the  last  term.  Setting  x — 2 yields 


n 

n — 2 


2n  - n - 1, 


n > 1. 


(12) 
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Setting  x — 3,  4,  ...  shows  that  relation  (n)  completely  defines  the  numbers 
(£)?  and  leads  to  a formula  originally  given  by  Euler: 

(")=(*+i)»-*-(”+i)+(*-ir("+i)-...+(-1)‘1.(»+i) 

k 

= )(k  + l-j)n,  n > 0,'  k > 0.  (13) 

j= 0 J 7 

Now  let  us  study  the  generating  function  for  runs.  If  we  set 

= (.4) 

k 

the  coefficient  of  zk  is  the  probability  that  a random  permutation  of  {1,2,...,  n) 
has  exactly  k runs.  Since  k runs  are  just  as  likely  as  n + 1 — k,  the  average  number 
of  runs  must  be  |(n  + l),  hence  g'Jl)  = |(n  + l).  Exercise  2(b)  shows  that  there 
is  a simple  formula  for  all  the  derivatives  of  g„(z)  at  the  point  z = 1: 

(.5) 

Thus  in  particular  the  variance  g"(l)  + g'Jl)  - g'n(l)2  comes  to  (n  + 1)/12,  for 
n > 2,  indicating  a rather  stable  distribution  about  the  mean.  (We  found  this 
same  quantity  in  Eq.  3.3.2-(i8),  where  it  was  called  covar(f?i,  i?').)  Since  gn(z) 
is  a polynomial,  we  can  use  formula  (15)  to  deduce  the  Taylor  series  expansions 

fcW  “ a £<* - {It  J } - a t ^ - *)-*«  C:  J }■ 

Ac— U k= 0 

(i6) 

The  second  of  these  equations  follows  from  the  first,  since 

9n(z)  = zn+1gn(l/z),  n>  1,  (17) 

by  the  symmetry  condition  (7).  The  Stirling  number  recurrence 


gives  two  slightly  simpler  representations, 


= h £ ^ = { l }.  (>s) 


when  n > 1.  The  super  generating  function 
^ _ V'  9n{z)xn  _ 


9{z,x)  = y.9A¥1=  e q 


I 
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is  therefore  equal  to 

V ((z-l)x)n  {n\k\  /e(*-D*-l\fc  (1-2) 

2—*  (z  — l)k  l fc  J n!  2—i  l z — 1 ) e(z_1)x  — z' 

k,n> 0 v ' fc>0  v ' 

this  is  another  relation  discussed  by  Euler. 

Further  properties  of  the  Eulerian  numbers  may  be  found  in  a survey  pa- 
per by  L.  Carlitz  [Math.  Magazine  32  (1959),  247-260].  See  also  J.  Riordan, 
Introduction  to  Combinatorial  Analysis  (New  York:  Wiley,  1958),  38-39,  214- 
219,  234-237;  D.  Foata  and  M.  P.  Schiitzenberger,  Lecture  Notes  in  Math.  138 
(Berlin:  Springer,  1970). 

Let  us  now  consider  the  length  of  runs;  how  long  will  a run  be,  on  the 
average?  We  have  already  studied  the  expected  number  of  runs  having  a given 
length,  in  Section  3.3.2;  the  average  run  length  is  approximately  2,  in  agreement 
with  the  fact  that  about  | (n  + 1)  runs  appear  in  a random  permutation  of 
length  n.  For  applications  to  sorting  algorithms,  a slightly  different  viewpoint  is 
useful;  we  will  consider  the  length  of  the  fcth  run  of  the  permutation  from  left  to 
right,  for  k = 1,  2,  . . . . 

For  example,  how  long  is  the  first  (leftmost)  run  of  a random  permutation 
ax  a2  . . . an?  Its  length  is  always  > 1,  and  its  length  is  > 2 exactly  one-half 
the  time  (namely  when  oi  < 02).  Its  length  is  > 3 exactly  one-sixth  of  the 
time  (when  a\  < a 2 < 03),  and,  in  general,  its  length  is  > m with  probability 
<lm  — 1 /ml,  for  1 < m < n.  The  probability  that  its  length  is  exactly  equal  to  m 
is  therefore 

Pm  — Qm  qm+ 1 = 1/m!  - l/(m  + 1)!,  for  1 < m < n; 

pn  = 1/n!.  (21) 

The  average  length  of  the  first  run  therefore  equals 

Pi+2p2-\ Vnpn  - (<?i  -92)  + 2(92-93)4 h(n-  l)(g„_i  -qn)  + nqn 

11  1 

- Ql  +92  + ' ' '+9n  - + - + •••+— . (22) 

If  we  let  n — > 00,  the  limit  is  e — 1 = 1.71828 . . . , and  for  finite  n the  value  is 
e — 1 — 5n  where  Sn  is  quite  small; 

1 1 \ e- 1 

n + 2 (n  + 2 )(n  + 3)  / — (n  + 1)! 

For  practical  purposes  it  is  therefore  convenient  to  study  runs  in  a random  infinite 
sequence  of  distinct  numbers 


{n  + 1)! 


01,02,03,...; 

by  “random”  we  mean  in  this  case  that  each  of  the  n!  possible  relative  orderings 
of  the  first  n elements  in  the  sequence  is  equally  likely.  The  average  length  of 
the  first  run  in  a random  infinite  sequence  is  exactly  e — 1. 

By  slightly  sharpening  our  analysis  of  the  first  run,  we  can  ascertain  the 
average  length  of  the  fcth  run  in  a random  sequence.  Let  qkm  be  the  probability 
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that  the  first  k runs  have  total  length  > m;  then  qkm  is  1/m!  times  the  number 
of  permutations  of  {l,2,...,m}  that  have  < k runs, 


Qkm  — 


(23) 


The  probability  that  the  first  k runs  have  total  length  m is  qkm  - qk(m+i)- 
Therefore  if  Lk  denotes  the  average  length  of  the  kth  run,  we  find  that 


L\  H + Lk  — average  total  length  of  first  k runs 

= (<Zfci  ~ Qk2)  + 2(^2  — Qkz)  + 3(gfe3  — qki)  + ■ • • 

= Qkl  + 1k2  + <7fc3  + • • • ■ 


Subtracting  LH (-Tfc-i  and  using  the  value  of  qkrn  in  (23)  yields  the  desired 

formula 

Lk  = h(k-i)  + h(k-i)  + h(k-i)  + '" = S(fc- i)ii-  (24) 

m>  1 

Since  = 0 except  when  k — 1,  Lk  turns  out  to  be  the  coefficient  of  zk~1  in 
the  generating  function  g(z,  1)  - 1 (see  Eq.  (19)),  so  we  have 

L(Z)  = (25) 

fc>  0 e 2 

From  Euler’s  formula  (13)  we  obtain  a representation  of  L*,  as  a polynomial  in  e: 


m> 0 j=0  J 

J=0  m>  0 7 — 0 m>  0 J 


_*(-l)k-ijk-i  jn  ^ (-l)fc— ijk- j- 1 ^ 

^ ( k~j )!  ^ n'  —'o  (k~j-  1)!  ^ ri\ 


l=o 


(-1  r~‘i 


n>  0 

k-j-jk-j- 1 


j=0 


(26) 


This  formula  for  Lk  was  first  obtained  by  B.  J.  Gassner  [see  CACM  10  (1967), 
89-93],  In  particular,  we  have 

Za  = e-  1 ss  1.71828...  ; 

L2  = e2  - 2e  « 1.95249  . . . ; 

Lz  = e3  - 3e2  + |e  « 1.99579 .... 

The  second  run  is  expected  to  be  longer  than  the  first,  and  the  third  run  will 
be  longer  yet,  on  the  average.  This  may  seem  surprising  at  first  glance,  but  a 
moment’s  reflection  shows  that  the  first  element  of  the  second  run  tends  to  be 
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Table  2 

AVERAGE  LENGTH  OF  THE  fcTH  RUN 


k 

Lk 

k 

Lk 

1 

1.71828  18284  59045+ 

10 

2.00000  00012  05997+ 

2 

1.95249  24420  12560- 

11 

2.00000  00001  93672+ 

3 

1.99579  13690  84285- 

12 

1.99999  99999  99909+ 

4 

2.00003  88504  76806- 

13 

1.99999  99999  97022- 

5 

2.00005  75785  89716+ 

14 

1.99999  99999  99719+ 

6 

2.00000  50727  55710- 

15 

2.00000  00000  00019+ 

7 

1.99999  96401  44022+ 

16 

2.00000  00000  00006+ 

8 

1.99999  98889  04744+ 

17 

2.00000  00000  00000+ 

9 

1.99999  99948  43434- 

18 

2.00000  00000  00000- 

small  (it  caused  the  first  run  to  terminate);  hence  there  is  a better  chance  for 
the  second  run  to  go  on  longer.  The  first  element  of  the  third  run  will  tend  to 
be  even  smaller  than  that  of  the  second. 

The  numbers  Lk  are  important  in  the  theory  of  replacement-selection  sorting 
(Section  5.4.1),  so  it  is  interesting  to  study  their  values  in  detail.  Table  2 shows 
the  first  18  values  of  Lk  to  15  decimal  places.  Our  discussion  in  the  preceding 
paragraph  might  lead  us  to  suspect  at  first  that  Lk+i  > Lk , but  in  fact  the  values 
oscillate  back  and  forth.  Notice  that  Lk  rapidly  approaches  the  limiting  value  2; 
it  is  quite  remarkable  to  see  these  monic  polynomials  in  the  transcendental 
number  e converging  to  the  rational  number  2 so  quickly!  The  polynomials  (26) 
are  also  somewhat  interesting  from  the  standpoint  of  numerical  analysis,  since 
they  provide  an  excellent  example  of  the  loss  of  significant  figures  when  nearly 
equal  numbers  are  subtracted;  using  19-digit  floating  point  arithmetic,  Gassner 
concluded  incorrectly  that  L12  > 2,  and  John  W.  Wrench,  Jr.,  has  remarked  that 
42-digit  floating  point  arithmetic  gives  jL28  correct  to  only  29  significant  digits. 

The  asymptotic  behavior  of  Lk  can  be  determined  by  using  simple  principles 
of  complex  variable  theory.  The  denominator  of  (25)  is  zero  only  when  e2_1  = z, 
namely  when 

ex_1cos  y = x and  ex_1  sin  y = y,  (27) 

if  we  write  z = x + iy.  Figure  3 shows  the  superimposed  graphs  of  these  two 
equations,  and  we  note  that  they  intersect  at  the  points  z = Zo,  Z\,  Z\,  z2,  z2, . . . , 
where  Zq  = 1, 


Z\  = (3.08884  30156  13044-)  + (7.46148  92856  54255-)  i,  (28) 
and  the  imaginary  part  3(zfc+i)  is  roughly  equal  to  Ss(zfc)  + 27T  for  large  k.  Since 

1 - z 


lim  ( Z ")  (z  - zfe)  = -1,  for  k > 0, 
\ez_1  — z) 


and  since  the  limit  is  —2  for  k — 0,  the  function 

_ , , r , . 2Z  Z Z Z Z 

Hfn(z)  — L(z)  + — I 1 — r 1 b ■ 


Z — Zo  z — Zi  z — Zi  z — z2  z — z2 


- + •••+- 


■ + • 


z — z, 
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has  no  singularities  in  the  complex  plane  for  |z|  < \zm+1\.  Hence  Rrn{z)  has  a 
power  series  expansion  pkzk  that  converges  absolutely  when  \z\  < |zm+1|;  it 
follows  that  pkMk  ->■  0 as  k oo,  where  M - |zm+1|  - e.  The  coefficients  of 
L(z)  are  the  coefficients  of 


2z  z/z± 
l — z 1 — z/zi 
namely, 


f/fl  , , z/zm  , 

1 - z/z i 1 - z/zm  ' 


Z / Z-m 
1 - z/zm 


+ Rm(z), 


Ln  = 2 + 2rx  n cos  n9x  + 2r2  n cos  n02  + • • • + 2r~"  cos  n0m  + 0(r~n+1),  (29) 

if  we  let 

zk  = rkei(>k . (go) 

This  shows  the  asymptotic  behavior  of  Ln.  We  have 


tt  = 8.07556  64528  89526-, 
r2  = 14.35456  68997  62106-, 
r3  = 20.62073  15381  80628-, 
r4  = 26.88795  29424  54546-, 


9 1 = 1.17830  39784  74668+; 

02  = 1.31268  53883  87636+; 

03  = 1.37427  90757  91688-; 

04  = 1.41049  72786  51865-;  (31) 


so  the  main  contribution  to  Ln  — 2 is  due  to  7T  and  9±,  and  convergence  of 
(29)  is  quite  rapid.  Further  analysis  [W.  W.  Hooker,  CACM  12  (1969),  dll- 
413]  shows  that  Rm(z)  cz  for  some  constant  c as  m -loo;  hence  the  series 
cos  n0k  actually  converges  to  Ln  when  n > 1.  (See  also  exercise  28.) 

A more  careful  examination  of  probabilities  can  be  carried  out  to  determine 
the  complete  probability  distribution  for  the  length  of  the  fcth  run  and  for  the 

total  length  of  the  first  k runs  (see  exercises  9,  10,  11).  The  sum  Lx -\ Lk 

turns  out  to  be  asymptotically  2k  - | + 0( 8~k). 


Let  us  conclude  this  section  by  considering  the  properties  of  runs  when  equal 
elements  are  allowed  to  appear  in  the  permutations.  The  famous  nineteenth- 
century  American  astronomer  Simon  Newcomb  amused  himself  by  playing  a 
game  of  solitaire  related  to  this  question.  He  would  deal  a deck  of  cards  into  a 
pile,  so  long  as  the  face  values  were  in  nondecreasing  order;  but  whenever  the 
next  card  to  be  dealt  had  a face  value  lower  than  its  predecessor,  he  would  start 
a new  pile.  He  wanted  to  know  the  probability  that  a given  number  of  piles 
would  be  formed  after  the  entire  deck  had  been  dealt  out  in  this  manner. 

Simon  Newcomb’s  problem  therefore  consists  of  finding  the  probability  dis- 
tribution of  runs  in  a random  permutation  of  a multiset.  The  general  answer 
is  rather  complicated  (see  exercise  12),  although  we  have  already  seen  how  to 
solve  the  special  case  when  all  cards  have  a distinct  face  value.  We  will  content 
ourselves  here  with  a derivation  of  the  average  number  of  piles  that  appear  in 
the  game. 

Suppose  first  that  there  are  m different  types  of  cards,  each  occurring  exactly 
p times.  An  ordinary  bridge  deck,  for  example,  has  m = 13  and  p = 4 if  suits 
are  disregarded.  A remarkable  symmetry  applying  to  this  case  was  discovered 
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ex  1 sin  y — y 
ex  ~ 1 cos  y = x 


Fig.  3.  Roots  of  e2  1 = 2. 


by  P.  A.  MacMahon  [Combinatory  Analysis  1 (Cambridge,  1915),  212-213]: 
The  number  of  permutations  with  k + 1 runs  is  the  same  as  the  number  with 
mp  — p — k + 1 runs.  When  p = 1,  this  relation  is  Eq.  (7),  but  for  p > 1 it  is 
quite  surprising. 

We  can  prove  the  symmetry  by  setting  up  a one-to-one  correspondence 
between  the  permutations  in  such  a way  that  each  permutation  with  k + 1 runs 
corresponds  to  another  having  mp  — p — k + 1 runs.  The  reader  is  urged  to  try 
discovering  such  a correspondence  before  reading  further. 

No  very  simple  correspondence  is  evident;  MacMahon’s  proof  was  based 
on  generating  functions  instead  of  a combinatorial  construction.  But  Foata’s 
correspondence  (Theorem  5.1.2B)  provides  a useful  simplification,  because  it 
tells  us  that  there  is  a one-to-one  correspondence  between  multiset  permutations 
with  k + 1 runs  and  permutations  whose  two-line  notation  contains  exactly  k 
columns  Ij.  with  x < y. 

Suppose  the  given  multiset  is  {p  • 1,  p • 2, . . . , p ■ m},  and  consider  the 
permutation  whose  two-line  notation  is 


/ 1 ...  1 2 ...  2 ...  m ...  to  \ 

V *11  • • • X\ p *21  . . . X2p  - . * *mi  . . . xmp  J 

We  can  associate  this  permutation  with  another  one, 

/I  ...  1 2 ...  2 ...  to  ...  to\ 

\x'll  •••  xlp  x'ml  x'mp  •••  *21  •••  x2p  ) 


(32) 


(33) 


where  x'  = m + 1 — x.  If  (32)  contains  k columns  of  the  form  vx  with  x < y,  then 
(33)  contains  (m  — l)p—k  such  columns;  for  we  need  only  consider  the  case  y > 1, 
and  x < y is  equivalent  to  a;'  > m+2  — y.  Now  (32)  corresponds  to  a permutation 
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with  k + 1 runs,  and  (33)  corresponds  to  a permutation  with  mp-p-k  + 1 runs, 
and  the  transformation  that  takes  (32)  into  (33)  is  reversible  — it  takes  (33)  back 
into  (32).  Therefore  MacMahon’s  symmetry  condition  has  been  established.  See 
exercise  14  for  an  example  of  this  construction. 

Because  of  the  symmetry  property,  the  average  number  of  runs  in  a random 
permutation  must  be  |((fc  + 1)  + (mp  - p - k + 1))  = 1 + 1 p(m  - 1).  For 
example,  the  average  number  of  piles  resulting  from  Simon  Newcomb’s  solitaire 
game  using  a standard  deck  will  be  25  (so  it  doesn’t  appear  to  be  a very  exciting 
way  to  play  solitaire). 

We  can  actually  determine  the  average  number  of  runs  in  general,  using  a 
fairly  simple  argument,  given  any  multiset  {nx  • xx,  n2  ■ x2, . . . , nm  ■ xm}  where 
the  .r’s  are  distinct.  Let  n = ri\  + n2  + • • • + nm,  and  imagine  that  all  of  the 
permutations  a\  a2  . . . an  of  this  multiset  have  been  written  down;  we  will  count 
how  often  a,  is  greater  than  ai+ 1,  for  each  fixed  value  of  i,  1 < i < n.  The 
number  of  times  a*  > al+l  is  just  half  of  the  number  of  times  at  / ai+1;  and  it 
is  not  difficult  to  see  that  a,  = ai+1  = Xj  exactly  Nnj(nj  - 1 )/n(n  - 1)  times, 
where  N is  the  total  number  of  permutations.  Hence  a,  = al+1  exactly 

n(n  - 1)  (ni(ni  - !)  + ' ' • + nm(nm  - 1))  = — — + • • • + n2m  - n) 

times,  and  a;  > aj+i  exactly 


N 


2 n(n  — 1) 


{n2  - (nj  + ■ ■ ■ + n2J) 


times.  Summing  over  i and  adding  N,  since  a run  ends  at  an  in  each  permutation, 
we  obtain  the  total  number  of  runs  among  all  N permutations: 


+ + + (34) 

Dividing  by  N gives  the  desired  average  number  of  runs. 

Since  runs  are  important  in  the  study  of  “order  statistics,”  there  is  a fairly 
large  literature  dealing  with  them,  including  several  other  types  of  runs  not 
considered  here.  For  additional  information,  see  the  book  Combinatorial  Chance 
by  F.  N.  David  and  D.  E.  Barton  (London:  Griffin,  1962),  Chapter  10;  and  the 
survey  paper  by  D.  E.  Barton  and  C.  L.  Mallows,  Annals  of  Math.  Statistics  36 
(1965),  236-260. 


EXERCISES 

1.  [M26]  Derive  Euler’s  formula  (13). 

► 2.  [M22]  (a)  Extend  the  idea  used  in  the  text  to  prove  (8),  considering  those  se- 
quences (i  1 <1 2 . . . on  that  contain  exactly  q distinct  elements,  in  order  to  prove  the 
formula 


integer  q > 0. 
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(b)  Use  this  identity  to  prove  that 


m)!, 


for  n > m. 


3.  [HM25]  Evaluate  the  sum  ^fc(£)(—l)*\ 

4.  [M21]  What  is  the  value  of  £*,(-!)*{£}  fc!  ("m*)  ? 

5.  [M20]  Deduce  the  value  of  (£)  modp  when  p is  prime. 

► 6.  [M21]  Mr.  B.  C.  Dull  noticed  that,  by  Eqs.  (4)  and  (13), 

k>  ox/c/  k>0j>0  yle 

Carrying  out  the  sum  on  k first,  he  found  that  Sfc>o(_1)fc_J  (fc-j)  = 0 f°r  3 > 0; 
hence  n\  = 0 for  all  n > 0.  Did  he  make  a mistake? 

7.  [HM4O]  Is  the  probability  distribution  of  runs,  given  by  (14),  asymptotically 
normal?  (See  exercise  1.2.10-13.) 

8.  [M24]  (P.  A.  MacMahon.)  Show  that  the  probability  that  the  first  run  of  a 
sufficiently  long  permutation  has  length  h,  the  second  has  length  l2 , . . . , and  the  fcth 
has  length  > Ik,  is 

l/(/i  + I2)!  l/(!i  + h + fa)*  •••  l/(2i  + fa  + fa  + • • ■ + Ifc)!  \ 

1 I//2!  1/(^2  + I3)!  • • • 1/(^2  + I3  + • • • + 

det  0 1 l/»s!  •••  l/(l3  + ■■•  + !*)! 

0 0 ...  1 1/ljfe! 

9.  [M30]  Let  hk(z)  — 5 2PkmZm , where  pkm  is  the  probability  that  m is  the  total 

length  of  the  first  k runs  in  a random  (infinite)  sequence.  Find  “simple”  expressions 
for  h\(z),  (z) , and  the  super  generating  function  h(z,x)  = ^2k  hk(z)xk . 

10.  [HM30]  Find  the  asymptotic  behavior  of  the  mean  and  variance  of  the  distribu- 
tions hk(z)  in  the  preceding  exercise,  for  large  k. 

11.  [M40]  Let  Hk(z)  ~ J2  F\mZm,  where  Pk  m is  the  probability  that  m is  the  length 
of  the  fcth  run  in  a random  (infinite)  sequence.  Express  Hi(z),  H2(z),  and  the  super 
generating  function  H(z,x)  = Hk(z)xk  in  terms  of  familiar  functions. 

12.  [M33]  (P.  A.  MacMahon.)  Generalize  Eq.  (13)  to  permutations  of  a multiset,  by 
proving  that  the  number  of  permutations  of  {ni  ■ 1,  n2  • 2, . . . , nm  ■ m}  having  exactly 
k runs  is 

where  n = ni  + n2  + 1-  nTO,. 

13.  [05]  If  Simon  Newcomb’s  solitaire  game  is  played  with  a standard  bridge  deck, 
ignoring  face  value  but  treating  clubs  < diamonds  < hearts  < spades,  what  is  the 
average  number  of  piles? 

14.  [Ml 8 ] The  permutation  3111231423342244  has  5 runs;  find  the  correspond- 
ing permutation  with  9 runs,  according  to  the  text’s  construction  for  MacMahon’s 
symmetry  condition. 
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► 15.  [M21]  ( Alternating  runs.)  The  classical  nineteenth-century  literature  of  combi- 
natorial analysis  did  not  treat  the  topic  of  runs  in  permutations,  as  we  have  considered 
them,  but  several  authors  studied  “runs”  that  are  alternately  ascending  and  descending. 
Thus  53247618  was  considered  to  have  4 runs:  5 3 2,  2 4 7,  761,  and  18.  (The  first 
run  would  be  ascending  or  descending,  according  as  oi  < a2  or  a!  > a2;  thus  ax  a2  . . . a„ 
and  an  . . . a2  ax  and  (n  + 1 — ax)(n  + 1 — a2) . . . (n  + 1 — q.n)  all  have  the  same  number 
of  alternating  runs.)  When  n elements  are  being  permuted,  the  maximum  number  of 
runs  of  this  kind  is  n — 1. 

Find  the  average  number  of  alternating  runs  in  a random  permutation  of  the  set 
{1, 2, . . . , n}.  [Hint:  Consider  the  proof  of  (34).] 

16.  [M30]  Continuing  the  previous  exercise,  let  \nk\  be  the  number  of  permutations 
of  {1,2,  ...,n}  that  have  exactly  k alternating  runs.  Find  a recurrence  relation,  by 
means  of  which  a table  of  ]^)(  can  be  computed;  and  find  the  corresponding  recurrence 
relation  for  the  generating  function  Gn(z)  = Y)kll}zk/n'-  Use  the  latter  recurrence 
to  discover  a simple  formula  for  the  variance  of  the  number  of  alternating  runs  in  a 
random  permutation  of  {1, 2, ... , n}. 

17.  [M25]  Among  all  2n  sequences  ax  a2  . . . a„,  where  each  a3  is  either  0 or  1,  how 
many  have  exactly  k runs  (that  is,  k - 1 occurrences  of  a3  > a3+i)? 

18.  [ M28 ] Among  all  n!  sequences  61  62  • • • bn  such  that  each  bj  is  an  integer  in  the 
range  0 < bj  < n - j,  how  many  have  (a)  exactly  k descents  (that  is,  k occurrences  of 
bj  > bj+ 1)?  (b)  exactly  k distinct  elements? 


Fig.  4.  Nonattacking  rooks  on  a chessboard,  with  k = 3 rooks  below  the  main  diagonal. 

► 19.  [M26]  (I.  Kaplansky  and  J.  Riordan,  1946.)  (a)  In  how  many  ways  can  n non- 
attacking rooks  no  two  in  the  same  row  or  column  — be  placed  on  an  nxn  chessboard, 
so  that  exactly  k lie  below  the  main  diagonal?  (b)  In  how  many  ways  can  k nonattacking 
rooks  be  placed  below  the  main  diagonal  of  an  n x n chessboard? 

For  example,  Fig.  4 shows  one  of  the  15619  ways  to  put  eight  nonattacking  rooks 
on  a standard  chessboard  with  exactly  three  rooks  in  the  unshaded  portion  below  the 
main  diagonal,  together  with  one  of  the  1050  ways  to  put  three  nonattacking  rooks  on 
a triangular  board. 

► 20.  [ M21 ] A permutation  is  said  to  require  k readings  if  we  must  scan  it  k times  from 
left  to  right  in  order  to  read  off  its  elements  in  nondecreasing  order.  For  example,  the 
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permutation  491825367  requires  four  readings:  On  the  first  we  obtain  1,2,3;  on  the 
second  we  get  4,  5,  6,  7;  then  8;  then  9.  Find  a connection  between  runs  and  readings. 

21.  [M22]  If  the  permutation  ai02-..an  of  {1,2,  ...,n}  has  k runs  and  requires 
j readings,  in  the  sense  of  exercise  20,  what  can  be  said  about  n„  . . . a2  ai? 

22.  [M26]  (L.  Carlitz,  D.  P.  Roselle,  and  R.  A.  Scoville.)  Show  that  there  is  no 
permutation  of  {1, 2, . . . , n}  with  n + 1 — r runs,  and  requiring  s readings,  if  rs  < n; 
but  such  permutations  do  exist  ifn>n  + l— r>s>l  and  rs  > n. 

23.  [ HM42 ] (Walter  Weissblum.)  The  “long  runs”  of  a permutation  ai  0,2  . ■ . an  are 
obtained  by  placing  vertical  lines  just  before  a segment  fails  to  be  monotonic;  long 
runs  are  either  increasing  or  decreasing,  depending  on  the  order  of  their  first  two 
elements,  so  the  length  of  each  long  run  (except  possibly  the  last)  is  > 2.  For  example, 
75|62|389|14  has  four  long  runs.  Find  the  average  length  of  the  first  two  long 
runs  of  an  infinite  permutation,  and  prove  that  the  limiting  long-run  length  is 

(1  + cot  i)/(3  - cot  1)  « 2.4202. 

24.  [M30]  What  is  the  average  number  of  runs  in  sequences  generated  as  in  exercise 
5.1.1-18,  as  a function  of  pi 

25.  [ M25 ] Let  Ui,  ...,{/„  be  independent  uniform  random  numbers  in  [0  . . 1).  What 
is  the  probability  that  \U\  + • • ■ + U„ J = kl 

26.  [ M20 ] Let  •&  be  the  operation  z~,  which  multiplies  the  coefficient  of  zn  in  a 
generating  function  by  n.  Show  that  the  result  of  applying  d to  1/(1  — z)  repeatedly, 
m times,  can  be  expressed  in  terms  of  Eulerian  numbers. 

► 27.  [M21]  An  increasing  forest  is  an  oriented  forest  in  which  the  nodes  are  labeled 
{1,  2, . . . , n}  in  such  a way  that  parents  have  smaller  numbers  than  their  children.  Show 
that  ( ” ) is  the  number  of  n-node  increasing  forests  with  k + 1 leaves. 

28.  [ HM35 ] Find  the  asymptotic  value  of  the  numbers  zm  in  Fig.  3 as  m ->  00,  and 
prove  that  J2™=i(zm  + ^m1)  = e - 5/2. 

► 29.  [ M30 ] The  permutation  a\ . . ,an  has  a “peak”  at  a3  if  1 < j < n and  a3 _ 1 < aj  > 
Oj+i.  Let  Snk  be  the  number  of  permutations  with  exactly  k peaks,  and  let  tnk  be  the 
number  with  k peaks  and  k descents.  Prove  that  (a)  snk  = 5) (2nJ  + j (2fc+ J + ^'{2k+2j 
(see  exercise  16);  (b)  snk  = 2n~1~2ktnk-,  (c)  '£k  (”>xfc  = tnkxk(l  + x)n-1-2fc. 

*5.1.4.  Tableaux  and  Involutions 

To  complete  our  survey  of  the  combinatorial  properties  of  permutations,  we 
will  discuss  some  remarkable  relations  that  connect  permutations  with  arrays 
of  integers  called  tableaux.  A Young  tableau  of  shape  (rti,  n2,  ■ . . , nm),  where 
ni  > n-2  > ■ ■ ■ > nm  > 0,  is  an  arrangement  of  rii  4-  n2  + ■ • • + nm  distinct 
integers  in  an  array  of  left-justified  rows,  with  n,  elements  in  row  i,  such  that 
the  entries  of  each  row  are  in  increasing  order  from  left  to  right,  and  the  entries 
of  each  column  are  increasing  from  top  to  bottom.  For  example, 


1 

2 

5 

9 

10 

15 

3 

6 

7 

13 

4 

8 

12 

14 

11 

(1) 
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is  a Young  tableau  of  shape  (6,  4,  4,  1).  Such  arrangements  were  introduced  by 
Alfred  Young  as  an  aid  to  the  study  of  matrix  representations  of  permutations 
[see  Proc.  London  Math.  Soc.  (2)  28  (1928),  255-292;  Bruce  E.  Sagan,  The 
Symmetric  Group  (Pacific  Grove,  Calif.:  Wadsworth  & Brooks/Cole,  1991)].  For 
simplicity,  we  will  simply  say  “tableau”  instead  of  “Young  tableau.” 

An  involution  is  a permutation  that  is  its  own  inverse.  For  example,  there 
are  ten  involutions  of  {1,  2,  3,  4}, 


(l  2 3 4\  A 2 3 4\ 

V1234J  V2134J 


A 2 3 4\ 
V3  2 1 4/ 


A 2 3 4\ 
\4  2 3 1 ) 


A 2 3 4\ 
VI  3 2 4j 


A 2 3 4\  A 2 3 4\ 
Vl  4 3 2y  VI  2 4 3/ 


A 2 3 4\  A 2 3 4\  A 2 3 4\ 
V2  14  3/  V3412J  V4  3 2 lJ 


(2) 


The  term  “involution”  originated  in  classical  geometry  problems;  involutions  in 
the  general  sense  considered  here  were  first  studied  by  H.  A.  Rothe  when  he 
introduced  the  concept  of  inverses  (see  Section  5.1.1). 

It  may  appear  strange  that  we  should  be  discussing  both  tableaux  and 
involutions  at  the  same  time,  but  there  is  an  extraordinary  connection  be- 
tween these  two  apparently  unrelated  concepts:  The  number  of  involutions  of 
{1,2, ...  ,n}  is  the  same  as  the  number  of  tableaux  that  can  be  formed  from  the 
elements  {1,2, . . . ,nj.  For  example,  exactly  ten  tableaux  can  be  formed  from 
{1,  2,  3,  4},  namely, 


[I 

3 4 

1 

CO 

1 

4 

2_ 

2 

3_ 

1 

J] 

1 

2 3 

1 

3 

_3_ 

4 

2 

4 

4 


1 

« 

2 

4_ 

1 

2 

3 

4 

2 4 


1 

2 

3 

4 


(3) 


corresponding  respectively  to  the  ten  involutions  (2). 

This  connection  between  involutions  and  tableaux  is  by  no  means  obvious, 
and  there  is  probably  no  very  simple  way  to  prove  it.  The  proof  we  will  discuss 
involves  an  interesting  tableau-construction  algorithm  that  has  several  other 
surprising  properties.  It  is  based  on  a special  procedure  that  inserts  new  elements 
into  a tableau. 


For  example,  suppose  that  we  want  to  insert  the  element  8 into  the  tableau 


1 

3 

5 

9 

12 

16 

2 

6 

10 

15 

4 

13 

14 

11 

17 

(4) 
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The  method  we  will  use  starts  by  placing  the  8 into  row  1,  in  the  spot  previously 
occupied  by  9,  since  9 is  the  least  element  greater  than  8 in  that  row.  Element  9 is 
“bumped  down”  into  row  2,  where  it  displaces  the  10.  The  10  then  “bumps”  the 
13  from  row  3 to  row  4;  and  since  row  4 contains  no  element  greater  than  13,  the 
process  terminates  by  inserting  13  at  the  right  end  of  row  4.  Thus,  tableau  (4) 
has  been  transformed  into 


1 

3 

5 

8 

12 

16 

2 

6 

9 

15 

4 

10 

14 

11 

13 

17 

(5) 


A precise  description  of  this  process,  together  with  a proof  that  it  always 
preserves  the  tableau  properties,  appears  in  Algorithm  I. 


Algorithm  I ( Insertion  into  a tableau).  Let  P = (Pij)  be  a tableau  of  positive 
integers,  and  let  r be  a positive  integer  not  in  P.  This  algorithm  transforms  P 
into  another  tableau  that  contains  x in  addition  to  its  original  elements.  The  new 
tableau  has  the  same  shape  as  the  old,  except  for  the  addition  of  a new  position 
in  row  s,  column  t,  where  s and  t are  quantities  determined  by  the  algorithm. 

(Parenthesized  remarks  in  this  algorithm  serve  to  prove  its  validity,  since 
it  is  easy  to  verify  inductively  that  the  remarks  are  valid  and  that  the  array  P 
remains  a tableau  throughout  the  process.  For  convenience  we  will  assume  that 
the  tableau  has  been  bordered  by  zeros  at  the  top  and  left  and  with  00 ’s  to  the 
right  and  below,  so  that  PtJ  is  defined  for  all  i,j  > 0.  If  we  define  the  relation 

a < b if  and  only  if  a < b or  a = b = 0 or  a — b = 00,  (6) 

the  tableau  inequalities  can  be  expressed  in  the  convenient  form 
Pij  =0  if  and  only  if  i = 0 or  j = 0; 

Pi]  £ Pi(j+ 1)  and  P^  < P(i+i)j,  for  all  i,j  > 0. 

The  statement  “ x 0 P”  means  that  either  1 = x or  r / PtJ  for  all  i,j  > 0.) 

11.  [Input  x.]  Set  * •< — 1,  set  X\  <—  x,  and  set  j to  the  smallest  value  such  that 
Pij  = 00. 

12.  [Find  aq+i.]  (At  this  point  P(i-i)j  < aq  < Pij  and  Xi  £ P.)  If  Xi  < Pi^_i'), 
decrease  j by  1 and  repeat  this  step.  Otherwise  set  Xi+ 1 «—  Pij  and  set 
ri  <-  j- 

13.  [Replace  by  Xj]  (Now  Pi(J_  1}  < x{  < xi+1  = Pl3  < Pi{j+ 1)(  P(i-i)j  < x{  < 
Xi+i  = Pij  £ P(i+i)j,  and  ^ = j.)  Set  Ptj  <-  aq. 

14.  [Is  xi+i  = 00?]  (Now  Pi(j-i)  < Pij  = Xi  < xi+i  < Pi(j+i),  P(i-i)j  < Pij  = 

Xi  < Xi+ 1 < ri  = j,  and  Xi+i  £ P.)  If  Xj+i  / 00,  increaise  * by  1 and 

return  to  step  12. 
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15.  [Determine  s,  t.]  Set  s <-  i,  t <-  j,  and  terminate  the  algorithm.  (At  this 
point  the  conditions 

Pst  7^  oo  and  P(s+\)t  = Ps(t+i)  = oo  (8) 

are  satisfied.)  | 

Algorithm  I defines  a “bumping  sequence” 

a;  = Xl  < x2  < ■ ■ ■ < xa  < x9+1  = oo,  (9) 

as  well  as  an  auxiliary  sequence  of  column  indices 


ri  > r2  > ■ ■ ■ > rs  = t;  (10) 

element  Pir.  has  been  changed  from  xi+1  to  xu  for  1 < i < s.  For  example, 
when  we  inserted  8 into  (4),  the  bumping  sequence  was  8,  9,  10,  13,  00,  and  the 
auxiliary  sequence  was  4,  3,  2,  2.  We  could  have  reformulated  the  algorithm  so 
that  it  used  much  less  temporary  storage;  only  the  current  values  of  j,  xu  and 
xi+1  need  to  be  remembered.  But  sequences  (9)  and  (10)  have  been  introduced 
so  that  we  can  prove  interesting  things  about  the  algorithm. 

The  key  fact  we  will  use  about  Algorithm  I is  that  it  can  be  run  backwards: 
Given  the  values  of  s and  t determined  in  step  15,  we  can  transform  P back 
into  its  original  form  again,  determining  and  removing  the  element  x that  was 
inserted.  For  example,  consider  (5)  and  suppose  we  are  told  that  element  13  is 
in  the  position  that  used  to  be  blank.  Then  13  must  have  been  bumped  down 
from  row  3 by  the  10,  since  10  is  the  greatest  element  less  than  13  in  that  row; 
similarly  the  10  must  have  been  bumped  from  row  2 by  the  9,  and  the  9 must, 
have  been  bumped  from  row  1 by  the  8.  Thus  we  can  go  from  (5)  back  to  (4). 
The  following  algorithm  specifies  this  process  in  detail: 


Algorithm  D ( Deletion  from  a tableau).  Given  a tableau  P and  positive 
integers  s,  t satisfying  (8),  this  algorithm  transforms  P into  another  tableau, 
having  almost  the  same  shape,  but  with  00  in  column  t of  row  s.  An  element  x, 
determined  by  the  algorithm,  is  deleted  from  P. 

(As  in  Algorithm  I,  parenthesized  assertions  are  included  here  to  facilitate 
a proof  that  P remains  a tableau  throughout  the  process.) 

Dl.  [Input  s,  t]  Set  j <- t,  i <- s,  a:s+1  4-  00. 

D2.  [Find  Xi.]  (At  this  point  PtJ  < xi+1  < P(i+1)j  and  xi+1  0 P.)  If  pi(j+1)  < 
xi+ii  increase  j by  1 and  repeat  this  step.  Otherwise  set  xl  4—  P-  and 
xi  <-  j. 


U3.  [Replace  by  xl+1.\  (Now  Pi(j_x)  < P2J  = Xi  < xi+1  ; 

Pij  = Xi  < xi+1  < P(i+1)j,  and  n = j.)  Set  4-  xi+1 
D4.  [Is  i = 1?]  (Now  Pi(j_1}  < Xi  < xt+1  = Pi:j  < Pi(j_ 
xi+i  ~ Pij  ~ P(i+i)j,  and  ,rl  = j.)  If  i > 1,  decrease  i 


0 + 1)  > P(i  — l)j  <- 

P(i~l)j  ^ xi 

1 and  return  to 


D5.  [Determine  x.\  Set  x x^\  the  algorithm  terminates.  (Now  0 < x < 00.)  | 
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The  parenthesized  assertions  appearing  in  Algorithms  I and  D are  not  only  a 
useful  way  to  prove  that  the  algorithms  preserve  the  tableau  structure;  they  also 
serve  to  verify  that  Algorithms  I and  D are  perfect  inverses  of  each  other.  If  we 
perform  Algorithm  I first,  given  some  tableau  P and  some  positive  integer  x 0 P, 
it  will  insert  x and  determine  positive  integers  s,  t satisfying  (8);  Algorithm  D 
applied  to  the  result  will  recompute  x and  will  restore  P . Conversely,  if  we 
perform  Algorithm  D first,  given  some  tableau  P and  some  positive  integers 
s,  t satisfying  (8),  it  will  modify  P,  deleting  some  positive  integer  x\  Algorithm  I 
applied  to  the  result  will  recompute  s,  t and  will  restore  P.  The  reason  is  that  the 
parenthesized  assertions  of  steps  13  and  D4  are  identical,  as  are  the  assertions  of 
steps  14  and  D3,  and  these  assertions  characterize  the  value  of  j uniquely.  Hence 
the  auxiliary  sequences  (9),  (10)  are  the  same  in  each  case. 

Now  we  are  ready  to  prove  a basic  property  of  tableaux: 


Theorem  A.  There  is  a one-to-one  correspondence  between  the  set  of  all 
permutations  of  {1,2,...,  rz}  and  the  set  of  ordered  pairs  ( P,Q ) of  tableaux 
formed  from  {1,2,...,  n},  where  P and  Q have  the  same  shape. 


(An  example  of  this  theorem  appears  within  the  proof  that  follows.) 

Proof.  It  is  convenient  to  prove  a slightly  more  general  result.  Given  any  two-line 
array 

f 9i  <?2  qn\  qi  < Qi  < •••  < qn,  . . 

\Pi  P2  Pn ) ’ Pi,P2,  •••  distinct, 

we  will  construct  two  corresponding  tableaux  P and  Q,  where  the  elements  of  P 
are  {pi,  . . . ,pn}  and  the  elements  of  Q are  {<71, . . . , qn } and  the  shape  of  P is  the 
shape  of  Q. 

Let  P and  Q be  empty  initially.  Then,  for  i = 1,  2,  . . . , n (in  this  order), 
do  the  following  operation:  Insert  p*  into  tableau  P using  Algorithm  I;  then  set 
Qst  t—  (p,  where  s and  t specify  the  newly  filled  position  of  P. 

For  example,  if  the  given  permutation  is  (J  ^ ® ® *),  we  obtain 


Insert  7: 
Insert  2: 

Insert  9: 

Insert  5: 


P Q 

0 0 

r 

3 
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7 
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7_ 

~2 

5 

]_ 

9 

1 

5 

A 

1 

5 

3 

6 
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3 
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5 

5 

9 

CO 

6 

7 

00 

(12) 


Insert  3: 
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so  the  tableaux  ( P,Q ) corresponding  to  (72953)  are 


(!3) 


It  is  clear  from  this  construction  that  P and  Q always  have  the  same  shape; 
furthermore,  since  we  always  add  elements  on  the  periphery  of  Q,  in  increasing 
order,  Q is  a tableau. 

Conversely,  given  two  equal-shape  tableaux  P and  Q,  we  can  find  the  cor- 
responding two-line  array  (11)  as  follows.  Let  the  elements  of  Q be 


qi  < q2  < ■ ■ ■ < qn- 

For  i = n,  . . . , 2,  1 (in  this  order),  let  pt  be  the  element  x that  is  removed  when 
Algorithm  D is  applied  to  P,  using  the  values  s and  t such  that  Qst  = ql. 

For  example,  this  construction  will  start  with  (13)  and  will  successively  undo 
the  calculation  (12)  until  P is  empty,  and  (72953)  's  obtained. 

Since  Algorithms  I and  D are  inverses  of  each  other,  the  two  constructions 
we  have  described  are  inverses  of  each  other,  and  the  one-to-one  correspondence 
has  been  established.  | 

The  correspondence  defined  in  the  proof  of  Theorem  A has  many  startling 
properties,  and  we  will  now  proceed  to  derive  some  of  them.  The  reader  is  urged 
to  work  out  the  example  in  exercise  1,  in  order  to  become  familiar  with  the 
construction,  before  proceeding  further. 

Once  an  element  has  been  bumped  from  row  1 to  row  2,  it  doesn’t  affect 
row  1 any  longer;  furthermore  rows  2,  3,  . . . are  built  up  from  the  sequence  of 
bumped  elements  in  exactly  the  same  way  as  rows  1,2,...  are  built  up  from  the 
original  permutation.  These  facts  suggest  that  we  can  look  at  the  construction 
of  Theorem  A in  another  way,  concentrating  only  on  the  first  rows  of  P and  Q. 
For  example,  the  permutation  (72953)  causes  the  following  action  in  row  1, 
according  to  (12): 

1:  Insert  7,  set  Qn  «—  1. 

3:  Insert  2,  bump  7. 

5:  Insert  9,  set  Q12  5.  (14) 

6:  Insert  5,  bump  9. 

8:  Insert  3,  bump  5. 

Thus  the  first  row  of  P is  2 3,  and  the  first  row  of  Q is  1 5.  Furthermore,  the 
remaining  rows  of  P and  Q are  the  tableaux  corresponding  to  the  “bumped” 
two-line  array 

(?!!)■  <-> 

In  order  to  study  the  behavior  of  the  construction  on  row  1,  we  can  consider 
the  elements  that  go  into  a given  column  of  this  row.  Let  us  say  that  (qi,Pi)  is 
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in  class  t with  respect  to  the  two-line  array 


fli  92  •••  9n\  9i  < 92  < < 9n,  , , 

\Pi  P2  ■■■  Pn ) ’ pi,p2,  distinct,  ^ ' 

if  pi  = Pu  after  Algorithm  I has  been  applied  successively  to  Pi,P2,  ■ ■ ■ ,Pi, 

starting  with  an  empty  tableau  P.  (Remember  that  Algorithm  I always  inserts 

the  given  element  into  row  1.) 

It  is  easy  to  see  that  ( qi,Pi ) is  in  class  1 if  and  only  if  p,  has  i — 1 inversions, 
that  is,  if  and  only  if  p*  = min{pi,p2, . . . ,Pi}  is  a “left- to- right  minimum.”  If  we 
cross  out  the  columns  of  class  1 in  (16),  we  obtain  another  two-line  array 


( <?2  • • • q'm  \ 

VP'l  P2  •••  Pm) 


(17) 


such  that  ( q,p ) is  in  class  t with  respect  to  (17)  if  and  only  if  it  is  in  class  t+1  with 
respect  to  (16).  The  operation  of  going  from  (16)  to  (17)  represents  removing 
the  leftmost  position  of  row  1.  This  gives  us  a systematic  way  to  determine  the 
classes.  For  example  in  (72953)  the  elements  that  are  left-to-right  minima  are 
7 and  2,  so  class  1 is  {(1,7),  (3,2)};  in  the  remaining  array  (jj  ® ®)  all  elements 
are  minima,  so  class  2 is  {(5, 9),  (6, 5),  (8, 3)}.  In  the  “bumped”  array  (15),  class 
1 is  {(3,7),  (8,5)}  and  class  2 is  {(6,9)}. 

For  any  fixed  value  of  t,  the  elements  of  class  t can  be  labeled 


(?«i  ),  • • • , ( qik,Pik  ) 


in  such  a way  that 

9fi  < 9i2  < ' ' ‘ < 9*fc,  , . 

Ph  > Pi2  > •■■  > Pik,  [ } 

since  the  tableau  position  Pu  takes  on  the  decreasing  sequence  of  values  pM  , . . . , 
pik  as  the  insertion  algorithm  proceeds.  At  the  end  of  the  construction  we  have 


P*lt  Pik , Qu  — Qit , 


(!9) 


and  the  “bumped”  two-line  array  that  defines  rows  2,  3,  . . . of  P and  Q contains 
the  columns 


f 9*2  9*3  • • ■ 9u  A 

\Ph  Pi  2 •••  Pik-x) 


(20) 


plus  other  columns  formed  in  a similar  way  from  the  other  classes. 

These  observations  lead  to  a simple  method  for  calculating  P and  Q by 
hand  (see  exercise  3),  and  they  also  provide  us  with  the  means  to  prove  a rather 
unexpected  result: 

Theorem  B.  If  the  permutation 


1 2 ...  n 
Cl  1 • • • CLfi 

corresponds  to  tableaux  ( P,Q ) in  the  construction  of  Theorem  A,  then  the 
inverse  permutation  corresponds  to  ( Q,P ). 
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This  fact  is  quite  startling,  since  P and  Q are  formed  by  such  completely 
different  methods  in  Theorem  A,  and  since  the  inverse  of  a permutation  is 
obtained  by  juggling  the  columns  of  the  two-line  array  rather  capriciously. 

Proof.  Suppose  that  we  have  a two-line  array  (16);  its  columns  are  essentially 
independent  and  can  be  rearranged.  Interchanging  the  lines  and  sorting  the 
columns  so  that  the  new  top  line  is  in  increasing  order  gives  the  “inverse”  array 

<12  ■■■  qn  \ _ fpi  P2  ■■■  Pn  \ 

\Pi  P2  ...  Pn)  \qi  q2  ...  qn) 

= (Pi  P2  ■■■  Pi  < P2  <■■■<  Pni  , , 

\9i  I2  ■■■  q'n  ) ’ q[,  q'2, ...,  q'n  distinct. 

We  will  show  that  this  operation  corresponds  to  interchanging  P and  Q in  the 
construction  of  Theorem  A. 

Exercise  2 reformulates  our  remarks  about  class  determination  so  that  the 
class  of  ( qiiPi ) doesn’t  depend  on  the  fact  that  qi,q2,  ■ . ■ ,qn  are  in  ascending 
order.  Since  the  resulting  condition  is  symmetrical  in  the  q’s  and  the  p’s,  the 
operation  (21)  does  not  destroy  the  class  structure;  if  ( q,p ) is  in  class  t with 
respect  to  (16),  then  (p,  q)  is  in  class  t with  respect  to  (21).  If  we  therefore 
arrange  the  elements  of  the  latter  class  t as 

Pik  < ■■■  <Pi2  <Ph, 

<hk  > ■■■  > qi2  > Qh . 

by  analogy  with  (18),  we  have 

Pit  = , Qit  = Pik  (23) 

as  in  (19),  and  the  columns 


f Pik- 1 • • • Pi2  Pil  A 
\ qik  ■ ■ • qi 3 <li2  ) 


(24) 


go  into  the  “bumped”  array  as  in  (20).  Hence  the  first  rows  of  P and  Q are 
interchanged.  Furthermore  the  “bumped”  two-line  array  for  (21)  is  the  inverse 
of  the  bumped  two-line  array  for  (16),  so  the  proof  is  completed  by  induction 
on  the  number  of  rows  in  the  tableaux.  | 


Corollary  B.  The  number  of  tableaux  that  can  be  formed  from  {1, 2, . . . , n}  is 
the  number  of  involutions  on  {1,2,...,  n}. 

Proof.  If  7 r is  an  involution  corresponding  to  ( P,Q ),  then  7r  = n~  corresponds 
to  ( Q,P );  hence  P = Q.  Conversely,  if  7r  is  any  permutation  corresponding 
to  ( P,P ),  then  7 r“  also  corresponds  to  (P,P);  hence  7r  = tt~.  So  there  is  a 
one-to-one  correspondence  between  involutions  ir  and  tableaux  P.  | 

It  is  clear  that  the  upper-left  corner  element  of  a tableau  is  always  the 
smallest.  This  suggests  a possible  way  to  sort  a set  of  numbers:  First  we  can 
put  the  numbers  into  a tableau,  by  using  Algorithm  I repeatedly;  this  brings  the 
smallest  element  to  the  corner.  Then  we  delete  the  smallest  element,  rearranging 
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the  remaining  elements  so  that  they  form  another  tableau;  then  we  delete  the 
new  smallest  element;  and  so  on. 

Let  us  therefore  consider  what  happens  when  we  delete  the  corner  element 
from  the  tableau 


1 

3 

5 

7 

11 

15 

2 

6 

8 

14 

4 

9 

13 

10 

12 

16 

(25) 


If  the  1 is  removed,  the  2 must  come  to  take  its  place.  Then  we  can  move  the 
4 up  to  where  the  2 was,  but  we  can’t  move  the  10  to  the  position  of  the  4;  the 
9 can  be  moved  instead,  then  the  12  in  place  of  the  9.  In  general,  we  are  led  to 
the  following  procedure. 


Algorithm  S ( Delete  comer  element).  Given  a tableau  P,  this  algorithm  deletes 
the  upper  left  corner  element  of  P and  moves  other  elements  so  that  the  tableau 
properties  are  preserved.  The  notational  conventions  of  Algorithms  I and  D are 
used. 

51.  [Initialize.]  Set  r «—  1,  s <—  1. 

52.  [Done?]  If  Prs  — 00,  the  process  is  complete. 

53.  [Compare.]  If  P(r+i)s  & Pr(s+ 1),  g°  to  step  S5.  (We  examine  the  elements 
just  below  and  to  the  right  of  the  vacant  cell,  and  we  will  move  the  smaller 
of  the  two.) 

54.  [Shift  left.]  Set  Prs  <—  Pr(s+i),  s <—  s + 1,  and  return  to  S3. 

55.  [Shift  up.]  Set  Pra  «—  P(r+i)s,  r <—  r + 1,  and  return  to  S2.  | 

It  is  easy  to  prove  that  P is  still  a tableau  after  Algorithm  S has  deleted  its 
corner  element  (see  exercise  10).  So  if  we  repeat  Algorithm  S until  P is  empty, 
we  can  read  out  its  elements  in  increasing  order.  Unfortunately  this  doesn’t 
turn  out  to  be  as  efficient  a sorting  algorithm  as  other  methods  we  will  see;  its 
minimum  running  time  is  proportional  to  n15,  but  similar  algorithms  that  use 
trees  instead  of  tableau  structures  have  an  execution  time  on  the  order  of  n log  n. 

In  spite  of  the  fact  that  Algorithm  S doesn’t  lead  to  a superbly  efficient 
sorting  algorithm,  it  has  some  very  interesting  properties. 

Theorem  C (M.  P.  Schutzenberger).  If  P is  the  tableau  formed  by  the  con- 
struction of  Theorem  A from  the  permutation  Oi  a2  . . . an,  and  if 

di  = min{ai,a2, . . . ,a„}, 

then  Algorithm  S changes  P to  the  tableau  corresponding  to  a 1. . . a.,_iaI+i. . . a„. 
Proof.  See  exercise  13.  | 
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After  we  apply  Algorithm  S to  a tableau,  let  us  put  the  deleted  element  into 
the  newly  vacated  place  Prs,  but  in  italic  type  to  indicate  that  it  isn’t  really  part 
of  the  tableau.  For  example,  after  applying  this  procedure  to  the  tableau  (25) 
we  would  have 


2 

3 

5 

7 

11 

15 

4 

6 

8 

14 

9 

12 

13 

10 

1 

16 

and  two  more  applications  yield 


4 

5 

7 

11 

15 

* 

6 

8 

13 

14 

9 

12 

3 

10 

1 

16 

Continuing  until  all  elements  are  removed  gives 


16 

14 

13 

12 

10 

2 

15 

9 

6 

4 

11 

5 

3 

8 

1 

7 

(26) 


which  has  the  same  shape  as  the  original  tableau  (25).  This  configuration  may 
be  called  a dual  tableau,  since  it  is  like  a tableau  except  that  the  “dual  order” 
has  been  used  (reversing  the  roles  of  < and  >).  Let  us  denote  the  dual  tableau 
formed  from  P in  this  way  by  the  symbol  Ps . 

From  Ps  we  can  determine  P uniquely;  in  fact,  we  can  obtain  the  original 
tableau  P from  Ps , by  applying  exactly  the  same  algorithm  — but  reversing  the 
order  and  the  roles  of  italic  and  regular  type,  since  Ps  is  a dual  tableau.  For 
example,  two  steps  of  the  algorithm  applied  to  (26)  give 


14 

13 

12 

10 

2 

15 

11 

9 

6 

4 

8 

5 

3 

7 

1 

16 

and  eventually  (25)  will  be  reproduced  again!  This  remarkable  fact  is  one  of  the 
consequences  of  our  next  theorem. 
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Theorem  D (C.  Schensted,  M.  P.  Schiitzenberger).  Let 

( 9i  92  • • • 9n  \ 

\Pl  P2  •••  PnJ 


(27) 


be  the  two-line  array  corresponding  to  the  tableaux  ( P,Q ). 

a)  Using  dual  (reverse)  order  on  the  q’s,  but  not  on  the  p’s,  the  two-line  array 


fqn  ■■■  92  qi\ 

\Pn  •••  P2  Pi) 


(28) 


corresponds  to  (PT , (QS)T). 

As  usual,  “T”  denotes  the  operation  of  transposing  rows  and  columns;  PT  is  a 
tableau,  while  ( QS)T  is  a dual  tableau,  since  the  order  of  the  q’s  is  reversed. 

b)  Using  dual  order  on  the  p’s,  but  not  on  the  q’s,  the  two-line  array  (27) 
corresponds  to  ((PS)T ,QT)  ■ 

c)  Using  dual  order  on  both  the  p’s  and  the  q’s,  the  two-line  array  (28)  corre- 
sponds to  ( PS,QS ). 


Proof.  No  simple  proof  of  this  theorem  is  known.  The  fact  that  case  (a) 
corresponds  to  ( PT,X ) for  some  dual  tableau  X is  proved  in  exercise  5;  hence 
by  Theorem  B,  case  (b)  corresponds  to  ( Y,QT ) for  some  dual  tableau  Y,  and 

Y must  have  the  shape  of  Pr. 

Let  pi  — min{pi, . . . ,pn};  since  p,  is  the  “largest”  element  in  the  dual  order, 
it  appears  on  the  periphery  of  Y,  and  it  doesn’t  bump  any  elements  in  the  con- 
struction of  Theorem  A.  Thus,  if  we  successively  insert  pi, . . . ,p«-i,p*+i,  • . . ,pn 
using  the  dual  order,  we  get  Y — {pi},  that  is,  Y with  p,  removed.  By  Theorem  C 
if  we  successively  insert  pi, . . . ,Pi-i,Pi+u . . . ,pn  using  the  normal  order,  we  get 
the  tableau  d(P)  obtained  by  applying  Algorithm  S to  P.  By  induction  on  n, 

Y — {Pi}  — (d(P)s)T-  But  since 

(PS)T  - {Pi}  = (d(P)s)T,  (29) 

by  definition  of  the  operation  S,  and  since  Y has  the  same  shape  as  (PS)T,  we 
must  have  Y = (PS)T. 

This  proves  part  (b),  and  part  (a)  follows  by  an  application  of  Theorem  B. 
Applying  parts  (a)  and  (b)  successively  then  shows  that  case  (c)  corresponds 
to  ({(PT)S)T,{(QS)T)T)',  and  this  is  ( PS,QS ) since  ( PS)T  = ( PT)S  by  the 
row-column  symmetry  of  operation  S.  | 


In  particular,  this  theorem  establishes  two  surprising  facts  about  the  tableau 
insertion  algorithm:  If  successive  insertion  of  distinct  elements  pi , . . . , pn  into  an 
empty  tableau  yields  tableau  P,  insertion  in  the  opposite  order  pn,  ■ ■ ■ ,p\  yields 
the  transposed  tableau  PT.  And  if  we  not  only  insert  the  p’s  in  this  order 
pn, ...  ,Pi  but  also  interchange  the  roles  of  < and  >,  as  well  as  0 and  00,  in 
the  insertion  process,  we  obtain  the  dual  tableau  Ps . The  reader  is  urged  to 
try  out  these  processes  on  some  simple  examples.  The  unusual  nature  of  these 
coincidences  might  lead  us  to  suspect  that  some  sort  of  witchcraft  is  operating 
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behind  the  scenes!  No  simple  explanation  for  these  phenomena  is  yet  known; 
there  seems  to  be  no  obvious  way  to  prove  even  that  case  (c)  corresponds  to 
tableaux  having  the  same  shape  as  P and  Q,  although  the  characterization  of 
classes  in  exercise  2 does  provide  a significant  clue. 

The  correspondence  of  Theorem  A was  given  by  G.  de  B.  Robinson  [Amer- 
ican J.  Math.  60  (1938),  745-760,  §5],  in  a somewhat'  vague  and  different  form, 
as  part  of  his  solution  to  a rather  difficult  problem  in  group  theory.  Robinson 
stated  Theorem  B without  proof.  Many  years  later,  C.  Schensted  independently 
rediscovered  the  correspondence,  which  he  described  in  terms  of  “bumping”  as 
we  have  done  in  Algorithm  I;  Schensted  also  proved  the  “P”  part  of  Theorem 
D(a)  [see  Canadian  J.  Math.  13  (1961),  179-191],  M.  P.  Schiitzenberger  [Math. 
Scand.  12  (1963),  117-128]  proved  Theorem  C and  the  “Q"  part  of  Theorem 
D(a),  from  which  (b)  and  (c)  follow.  It  is  possible  to  extend  the  correspondence 
to  permutations  of  multisets ; the  case  that  pi , . . . , pn  need  not  be  distinct  was 
considered  by  Schensted,  and  the  “ultimate”  generalization  to  the  case  that  both 
the  p’s  and  the  q’s  may  contain  repeated  elements  was  investigated  by  Knuth 
[Pacific  J.  Math.  34  (1970),  709-727], 

Let  us  now  turn  to  a related  question:  How  many  tableaux  formed  from 

{1,2, . . . ,n}  have  a given  shape  (ni,n2, . . .,nm),  where  n1+n2-\ hnm  = n? 

If  we  denote  this  number  by  f(n1,n2,  and  if  we  allow  the  parameters  nj 

to  be  arbitrary  integers,  the  function  / must  satisfy  the  relations 

/(ni,n2, . . . ,nm)  — 0 unless  nx  > n2  > • • • > nm  > 0; 

/(ni,n2, . . . ,nm,0)  = f{nx,n2, . . . ,nm); 

/(ni,n2, . . . ,nm ) = f{n\  — l,n2, . . . ,nm ) + /(ni,n2  — 1, . . . ,nm) 

A f {n i , n2 , . . . , nm  1) , 

if  ni  > n2  > > nm  > 1.  (32) 

Recurrence  (32)  comes  from  the  fact  that  a tableau  with  its  largest  element 
removed  is  always  another  tableau;  for  example,  the  number  of  tableaux  of  shape 
(6, 4, 4, 1)  is  /( 5, 4, 4, 1)  + /( 6, 3, 4, 1)  + /( 6, 4, 3, 1)  + /(6, 4, 4, 0)  = /( 5, 4, 4, 1)  + 
/(6,4, 3, 1)  + /(6,4, 4),  since  every  tableau  of  shape  (6, 4, 4, 1)  on  {1,2,...,  15} 
is  formed  by  inserting  the  element  15  into  the  appropriate  place  in  a tableau  of 
shape  (5,4,4, 1),  (6,4, 3,1),  or  (6,4,4).  Schematically, 


The  function  f{nun2,...,nm)  that  satisfies  these  relations  has  a fairly 
simple  form, 


(30) 

(31) 


f (n-i , n2 , . . . , nm ) 


A(m  + m - 1,  n2  + to  - 2,  . . . , nm)  n\ 
(ni  + to  - 1)!  (n2  + to  - 2)!  ...  nm\ 


(34) 
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provided  that  the  relatively  mild  conditions 


ni+m-l>n2+m-2>--->nm 

are  satisfied;  here  A denotes  the  “square  root  of  the  discriminant”  function 


A(xi,x2,  . . . , xm)  — det 


m-l  m — 1 

X1  x2 


o-i 

Xi 

1 


x2 

X2 

1 


1 \ 
\ 


l / 


= (Xi-Xj).  (35) 


Formula  (34)  was  derived  by  G.  Frobenius  [Sitzungsberichte  preuB.  Akad.  der 
Wissenschaften  (1900),  516-534,  §3],  in  connection  with  an  equivalent  problem 
in  group  theory,  using  a rather  deep  group-theoretical  argument;  a combinatorial 
proof  was  given  independently  by  MacMahon  [Philosophical  Trans.  A209  (1909), 
153-175].  The  formula  can  be  established  by  induction,  since  relations  (30)  and 
(31)  are  readily  proved  and  (32)  follows  by  setting  y - -1  in  the  identity  of 
exercise  17. 

Theorem  A gives  a remarkable  identity  in  connection  with  this  formula  for 
the  number  of  tableaux.  If  we  sum  over  all  shapes,  we  have 


n!  = f(ki,k2...,kn)2 

ki  >k2>‘-->kn>0 
k\+k2-\ ( -kn=n 


= n-2 

ki>k2>->kn>0 

kl+k2-\ \-kn=7l 


A(fcx  + n - 1,  fc2  + n - 2,  . . . , kn )2 
(fci  + n - l)!2  (k2  + n - 2)!2  . . . kn\2 


= n'2 

gi>92>"->9n>0 

\-qn=(n+l)n/2 


%^2v,fe)2, 

qi'.2q2\2  ...g„!2 


hence 


E 

91+92  H l~9n=(^+ l)n/2 

9i,92,  ...,9«>0 


A(gi,g2,...,gn)2 
qi\2q2\2  ...qn\2 


(36) 


The  inequalities  q\  > q2  > • ■ ■ > qn  have  been  removed  in  the  latter  sum,  since 
the  summand  is  a symmetric  function  of  the  r/’s  that  vanishes  when  q.t  = qj. 
A similar  identity  appears  in  exercise  24. 

The  formula  for  the  number  of  tableaux  can  also  be  expressed  in  a much 
more  interesting  way,  based  on  the  idea  of  “hooks.”  The  hook  corresponding  to 
a cell  in  a tableau  is  defined  to  be  the  cell  itself  plus  the  cells  lying  below  and 
to  its  right.  For  example,  the  shaded  area  in  Fig.  5 is  the  hook  corresponding  to 
cell  (2, 3)  in  row  2,  column  3;  it  contains  six  cells.  Each  cell  of  Fig.  5 has  been 
filled  in  with  the  length  of  its  hook. 
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Fig.  5.  Hooks  and  hook  lengths. 

If  the  shape  of  the  tableau  is  (ni,  n2, . . . , nm),  the  longest  hook  has  length 
ni+m-1.  Further  examination  of  the  hook  lengths  shows  that  row  1 con- 
tains all  the  lengths  ni+m  — 1,  ni+m  — 2,  . . . , 1 except  for  (ni+m  — 1)  — (nTO), 
(ni  +to—  1)  — (nm_i  + 1),  . . . , (ni+m—1)  — (n2  + m — 2).  In  Fig.  5,  for  example, 
the  hook  lengths  in  row  1 are  12,  11,  10,  ...,  1 except  for  10,  9,  6,  3,  2;  the 
exceptions  correspond  to  five  nonexistent  hooks,  from  nonexistent  cells  (6,3), 
(5,3),  (4,5),  (3,7),  (2,7)  leading  up  to  cell  (1,7).  Similarly,  row  j contains 
all  lengths  nj  + m-j,  ...,  1,  except  for  (riy  + m- j)  - (nm),  ...,  (■ nj  + m-j )- 
(nj+i+m—j  — l).  It  follows  that  the  product  of  all  the  hook  lengths  is  equal  to 

(ni+m-1)!  (n2  + m-2)! . . ,nm! 

A(ni+m-l,n2-(-m-2, . . . ,nm) ' 

This  is  just  what  happens  in  Eq.  (34),  so  we  have  derived  the  following  celebrated 
result  due  to  J.  S.  Frame,  G.  de  B.  Robinson,  and  R.  M.  Thrall  [Canadian  J. 
Math.  6 (1954),  316-318]: 

Theorem  H.  The  number  of  tableaux  on  {1,2,...,  nj  having  a specified  shape 
is  n!  divided  by  the  product  of  the  hook  lengths.  | 

Since  this  is  such  a simple  rule,  it  deserves  a simple  proof;  a heuristic 
argument  runs  as  follows:  Each  element  of  the  tableau  is  the  smallest  in  its 
hook.  If  we  fill  the  tableau  shape  at  random,  the  probability  that  cell  (i,j)  will 
contain  the  minimum  element  of  the  corresponding  hook  is  the  reciprocal  of  the 
hook  length;  multiplying  these  probabilities  over  all  i and  j gives  Theorem  H. 
But  unfortunately  this  argument  is  fallacious,  since  the  probabilities  are  far  from 
independent!  No  direct  proof  of  Theorem  H,  based  on  combinatorial  properties  of 
hooks  used  correctly,  was  known  until  1992  (see  exercise  39),  although  researchers 
did  discover  several  instructive  indirect  proofs  (exercises  35,  36,  and  38). 

Theorem  H has  an  interesting  connection  with  the  enumeration  of  trees, 
which  we  considered  in  Chapter  2.  We  observed  that  binary  trees  with  n nodes 
correspond  to  permutations  that  can  be  obtained  with  a stack,  and  that  such 
permutations  correspond  to  sequences  ai  a2  . . . a2n  of  n S’s  and  n X’s,  where  the 
number  of  S’s  is  never  less  than  the  number  of  X’s  as  we  read  from  left  to  right. 
(See  exercises  2.2. 1-3  and  2.3. 1-6.)  The  latter  sequences  correspond  in  a natural 
way  to  tableaux  of  shape  (n,  n);  we  place  in  row  1 the  indices  i such  that  a,  = S, 
and  in  row  2 we  put  those  indices  with  a,  = X.  For  example,  the  sequence 

sssxxssxxsxx 
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corresponds  to  the  tableau 


1 

2 

3 

6 

7 

10 

4 

5 

00 

9 

11 

12 

(37) 


The  column  constraint  is  satisfied  in  this  tableau  if  and  only  if  the  number  of  X’s 
never  exceeds  the  number  of  S’s  from  left  to  right.  By  Theorem  H,  the  number 
of  tableaux  of  shape  (n,  n)  is 

(2  n)!  , 

( n + 1)!  n\  ’ 


so  this  is  the  number  of  binary  trees,  in  agreement  with  Eq.  2.3.4.4-(i4).  Further- 
more, this  argument  solves  the  more  general  “ballot  problem”  considered  in 
the  answer  to  exercise  2.2. 1-4,  if  we  use  tableaux  of  shape  (n,  to)  for  n > to. 
So  Theorem  H includes  some  rather  complex  enumeration  problems  as  simple 
special  cases. 

Any  tableau  A of  shape  (n,  n)  on  the  elements  {1,2,  ...,2n}  corresponds 
to  two  tableaux  (P,  Q)  of  the  same  shape,  in  the  following  way  suggested  by 
MacMahon  [Combinatory  Analysis  1 (1915),  130-131]:  Let  P consist  of  the  ele- 
ments {1, . . . , n}  as  they  appear  in  A;  then  Q is  formed  by  taking  the  remaining 
elements,  rotating  the  configuration  by  180°,  and  replacing  n + 1,  n + 2,  . . . , 2n 
by  n,  n — 1,  . . . , 1,  respectively.  For  example,  (37)  splits  into 


1 

2 

3 

6 

4 

5 

and 


rotation  and  renaming  of  the  latter  yields 
P = 


1 

2 

3 

6 

4 

5 

Q = 


7 

10 

8 

9 

11 

12 

1 

2 

4 

m 

3 

6 

(38) 


Conversely,  any  pair  of  equal-shape  tableaux  of  at  most  two  rows,  each  containing 
n cells,  corresponds  in  this  way  to  a tableau  of  shape  (n,  n).  Hence  by  exercise  7 
the  number  of  permutations  ax  a2  ■ ■ . an  of  {1,2,...,  n}  containing  no  decreasing 
subsequence  a,  > o,j  > a*,  for  i < j < k is  the  number  of  binary  trees  with 
n nodes.  An  interesting  one-to-one  correspondence  between  such  permutations 
and  binary  trees,  more  direct  than  the  roundabout  method  via  Algorithm  I that 
we  have  used  here,  has  been  found  by  D.  Rotem  [Inf.  Proc.  Letters  4 (1975), 
58-61];  similarly  there  is  a rather  direct  correspondence  between  binary  trees 
and  permutations  having  no  instances  of  a*  > a*,  > a,j  for  i < j < k (see  exercise 
2. 2. 1-5). 

The  number  of  ways  to  fill  a tableau  of  shape  (6, 4, 4, 1)  is  obviously  the 
number  of  ways  to  put  the  labels  {1,2,...,  15}  onto  the  vertices  of  the  directed 
graph 
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in  such  a way  that  the  label  of  vertex  u is  less  than  the  label  of  vertex  v whenever 
u->v.  In  other  words,  it  is  the  number  of  ways  to  sort  the  partial  ordering  (39) 
topologically,  in  the  sense  of  Section  2.2.3. 

In  general,  we  can  ask  the  same  question  for  any  directed  graph  that  contains 
no  oriented  cycles.  It  would  be  nice  if  there  were  some  simple  formula  generalizing 
Theorem  H to  the  case  of  an  arbitrary  directed  graph;  but  not  all  graphs  have 
such  pleasant  properties  as  the  graphs  corresponding  to  tableaux.  Some  other 
classes  of  directed  graphs  for  which  the  labeling  problem  has  a simple  solution 
are  discussed  in  the  exercises  at  the  close  of  this  section.  Other  exercises  show 
that  some  directed  graphs  have  no  simple  formula  corresponding  to  Theorem  H. 
For  example,  the  number  of  ways  to  do  the  labeling  is  not  always  a divisor  of  n\. 

To  complete  our  investigations,  let  us  count  the  total  number  of  tableaux 
that  can  be  formed  from  n distinct  elements;  we  will  denote  this  number  by  tn. 
By  Corollary  B,  tn  is  the  number  of  involutions  of  {1, 2, . . . , u j . A permutation 
is  its  own  inverse  if  and  only  if  its  cycle  form  consists  solely  of  one-cycles  (fixed 
points)  and  two-cycles  (transpositions).  Since  tn_j  of  the  tn  involutions  have 
(n)  as  a one-cycle,  and  since  t„_2  of  them  have  (j  n)  as  a two-cycle,  for  fixed 
j < n,  we  obtain  the  formula 


tn  — tn- 1 + (n  — l)tn_2,  (40) 

which  Rothe  devised  in  1800  to  tabulate  tn  for  small  n.  The  values  for  n > 0 
are  1,  1,  2,  4,  10,  26,  76,  232,  764,  2620,  9496,  .... 

Counting  another  way,  let  us  suppose  that  there  are  k two-cycles  and  (n— 21c) 
one-cycles.  There  are  (?” ) ways  to  choose  the  fixed  points,  and  the  multinomial 
coefficient  (2fc)!/(2!)fc  is  the  number  of  ways  to  arrange  the  other  elements 
into  k distinguishable  transpositions;  dividing  by  A:!  to  make  the  transpositions 
indistinguishable  we  therefore  obtain 


[n/2J 

tn  ^ ^ tn(k ), 

k= 0 


tn{k)  - 


n! 

(n  — 2k)\  2kk\ 


(41) 


Unfortunately,  this  sum  has  no  simple  closed  form  (unless  we  choose  to  regard  the 
Hermite  polynomial  in 2 n,/2f7„(— i/%/ 2 ) as  simple),  so  we  resort  to  two  indirect 
approaches  in  order  to  understand  tn  better: 

a)  We  can  find  the  generating  function 


]Ttnzn/n!  = e*+z2/2;  (42) 

n 

see  exercise  25. 

b)  We  can  determine  the  asymptotic  behavior  of  tn.  This  is  an  instructive 
problem,  because  it  involves  some  general  techniques  that  will  be  useful  to 
us  in  other  connections,  so  we  will  conclude  this  section  with  an  analysis  of 
the  asymptotic  behavior  of  tn. 
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The  first  step  in  analyzing  the  asymptotic  behavior  of  (41)  is  to  locate  the 
main  contribution  to  the  sum.  Since 

tn(k  + 1)  (n  — 2k)(n  — 2fc  — 1)  . 

“W  = WTi) ' (43) 

we  can  see  that  the  terms  gradually  increase  from  k — 0 until  tn(k  + 1)  ss  tn(k) 
when  k is  approximately  |(n  — y/n ) ; then  they  decrease  to  zero  when  k exceeds 
|n.  The  main  contribution  clearly  comes  from  the  vicinity  of  k = |(n  — y/n). 
It  is  usually  preferable  to  have  the  main  contribution  at  the  value  0,  so  we  write 

k — \(n  - y/n)  + x,  (44) 

and  we  will  investigate  the  size  of  tn(k)  as  a function  of  x. 

One  useful  way  to  get  rid  of  the  factorials  in  tn(k)  is  to  use  Stirling’s 
approximation,  Eq.  1.2.11.2-(i8).  For  this  purpose  it  is  convenient  (as  we  shall 
see  in  a moment)  to  restrict  x to  the  range 

-nc+ 1/4  < x < ne+1/4,  (45) 


where  e = 0.001,  say,  so  that  an  error  term  can  be  included.  A somewhat 
laborious  calculation,  which  the  author  did  by  hand  in  the  60s  but  which  is  now 
easily  done  with  the  help  of  computer  algebra,  yields  the  formula 

tn(k)  = exp(|nlnn  — |n  + y/n  — | Inn  — 2x2/y/n  — | | ln7r 

— |x3/n  + 2x/y/n  + \/\/n  — | x4/ny/n  + 0(n5e~3^4)) . (46) 

The  restriction  on  x in  (45)  can  be  justified  by  the  fact  that  we  may  set  x = 
±ne+1/4  to  get  an  upper  bound  for  all  of  the  discarded  terms,  namely 

e~2n  exp(|nlnn  - |n  + y/n  - | Inn  - | | ln7r  + 0(n3e~4/4)),  (47) 

and  if  we  multiply  this  by  n we  get  an  upper  bound  for  the  sum  of  the  excluded 
terms.  The  upper  bound  is  of  lesser  order  than  the  terms  we  will  compute  for 
x in  the  restricted  range  (45),  because  of  the  factor  exp(— 2n2e),  which  is  much 
smaller  than  any  polynomial  in  n. 

We  can  evidently  remove  the  factor 

exp(|nlnn  — \n  + y/n  - \ Inn  — | — \ ln7r  + \/Vn)  (48) 


from  the  sum,  and  this  leaves  us  with  the  task  of  summing 
exp(— 2 x2/y/n  — |x3/n  + 2 x/y/n  — | x^/ny/n  + 0(n5e_3^4)) 


-2x2 

y/n 


4 x3  8 x6 
3 n 9 n2 


x x 

1+2— =+2— 
y/n  n 


— exp 
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over  the  range  x — a,  a+l,  ...,  /?  — 2,  (3  — 1,  where  —a  and  j3  are  approximately 
equal  to  ne+1/4  (and  not  necessarily  integers).  Euler’s  summation  formula, 
Eq.  1.2.11.2-(io),  can  be  written 

Y /(*)  = 

a<x<(3 


f 


f (x)  dx  — -f(x) 


2 1! 


+ •••  + 


1 


TO  + 1 


B, 


m+1 


fim)(x) 


+ R 


m+1  j 


(50) 


by  translation  of  the  summation  interval.  Here  \Rm  |<(4/(2+™)/f|/M(z)|dx. 
If  we  let  f(x)  = x*  exp (—2a:2/ y/n ) , where  t is  a fixed  nonnegative  integer,  Euler’s 
summation  formula  will  give  an  asymptotic  series  for  f(x)  as  n ->  00,  since 


f{m)(x)  = n^-^/V"*)^-1/4  x),  g(y)  = /e"2^,  (51) 


and  g(y)  is  a well-behaved  function  independent  of  n.  The  derivative  girrlHy)  is 
e~2y2  times  a polynomial  in  y,  hence  Rm  = 0(n^t+1_m)/4)  \g^m\y)\dy  = 
0(n(t+1~m)/4).  Furthermore  if  we  replace  a and  [3  by  — cxd  and  +00  in  the  right- 
hand  side  of  (50),  we  make  an  error  of  at  most  0(exp(-2n2e))  in  each  term. 
Thus 

f(x)  = 

a<x<p 

only  the  integral  is  really  significant,  given  this  particular  choice  of  f(x)\  The 
integral  is  not  difficult  to  evaluate  (see  exercise  26),  so  we  can  multiply  out  and 
sum  formula  (49),  giving  y/7r/2(n1/4  - ^n-1/4  + 0(n-1/2)).  Thus 

tn  = in"/2e-'‘/2^-1/4(l  + Xn-V2  + 0{n- 3/4)).  (53) 


/: 


f(x)  dx  + 0(n  m),  for  all  to  > 0; 


(52) 


Actually  the  O-terms  here  should  have  an  extra  9e  in  the  exponent,  but  our 
manipulations  make  it  clear  that  this  9e  would  disappear  if  we  had  carried  further 
accuracy  in  the  intermediate  calculations.  In  principle,  the  method  we  have 
used  could  be  extended  to  obtain  0(n~k)  for  any  k,  instead  of  <9(n~3/4).  This 
asymptotic  series  for  tn  was  first  determined  (using  a different  method)  by  Moser 
and  Wyman,  Canadian  J.  Math.  7 (1955),  159-168. 

The  method  we  have  used  to  derive  (53)  is  an  extremely  useful  technique  for 
asymptotic  analysis  that  was  introduced  by  P.  S.  Laplace  [Memoires  Acad.  Sci. 
(Paris,  1782),  1-88];  it  is  discussed  under  the  name  “trading  tails”  in  CMath , 
§9.4.  For  further  examples  and  extensions  of  tail-trading,  see  the  conclusion  of 
Section  5.2.2. 


EXERCISES 

1.  [16]  What  tableaux  ( P,Q ) correspond  to  the  two-line  array 

(1  2 3 4 5 6 7 8 9\ 

(6  49571283^’ 
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in  the  construction  of  Theorem  A?  What  two-line  array  corresponds  to  the  tableaux 


1 

4 

7 

2 

00 

5 

9 

1 

3 

7 

4 

5 

8 

9 

2.  [ M21 ] Prove  that  (q,p)  belongs  to  class  t with  respect  to  (16)  if  and  only  if  t is 
the  largest  number  of  indices  ii, . . . , it  such  that 


Ph  <Pi2  <■■■  < Pn  = P,  Hi  < Qh  < ■ ■ ■ < In  = Q- 


► 3.  [M24  ] Show  that  the  correspondence  defined  in  the  proof  of  Theorem  A can  also 
be  carried  out  by  constructing  a table  such  as  this: 


Line  0 1356 

Line  1 7295 

Line  2 oo  7 oo  9 
Line  3 oo  oo 

Line  4 

Here  lines  0 and  1 constitute  the  given  two-line  array, 
from  line  k by  the  following  procedure: 


8 

3 

5 

7 

oo 

For  k > 1,  line  k + 1 is  formed 


a)  Set  p ■<—  oo. 

b)  Let  column  j be  the  leftmost  column  in  which  line  k contains  an  integer  < p,  but 
line  k + 1 is  blank.  If  no  such  columns  exist,  and  if  p — oo,  line  k + 1 is  complete; 
if  no  such  columns  exist  and  p < oo,  return  to  (a). 

c)  Insert  p into  column  j in  line  k + 1,  then  set  p equal  to  the  entry  in  column  j of 
line  k and  return  to  (b). 

Once  the  table  has  been  constructed  in  this  way,  row  k of  P consists  of  those  integers 
in  line  k that  are  not  in  line  ( k + 1);  row  k of  Q consists  of  those  integers  in  line  0 that 
appear  in  a column  containing  oo  in  line  k + 1. 

► 4.  [ M30 ] Let  ai ...  a,j-i  a.j  ...  an  be  a permutation  of  distinct  elements,  and  assume 
that  1 < j < n.  The  permutation  at . . . a,_2  a,j  aj-i  a-j+i  • ■ • “n,  obtained  by  inter- 
changing aj-i  with  a j,  is  called  “admissible”  if  either 

i)  j > 3 and  a,j- 2 lies  between  a,j-\  and  a};  or 

ii)  j < n and  (ij+i  lies  between  a,j- 1 and  Oj. 

For  example,  exactly  three  admissible  interchanges  can  be  performed  on  the  permuta- 
tion 1546837;  we  can  interchange  the  1 and  the  5 since  1 < 4 < 5;  we  can  interchange 
the  8 and  the  3 since  3 < 6 < 8 (or  since  3 < 7 < 8);  but  we  cannot  interchange  the  5 
and  the  4,  or  the  3 and  the  7. 

a)  Prove  that  an  admissible  interchange  does  not  change  the  tableau  P formed  from 
the  permutation  by  successive  insertion  of  the  elements  ai , 02 , . . . , an  into  an 
initially  empty  tableau. 

b)  Conversely,  prove  that  any  two  permutations  that  have  the  same  P tableau  can  be 
transformed  into  each  other  by  a sequence  of  one  or  more  admissible  interchanges. 
[Hint:  Given  that  the  shape  of  P is  (rai,  ri2, . . . , nm),  show  that  any  permuta- 
tion that  corresponds  to  P can  be  transformed  into  the  “canonical  permutation” 
Prni  ■ ■ ■ Pmnm  ■ ■ ■ Pi\  ■ ■ ■ p2n2  Pn  . . . Pin,  by  a sequence  of  admissible  interchanges.] 

► 5.  [M22]  Let  P be  the  tableau  corresponding  to  the  permutation  ai02-..an;  use 
exercise  4 to  prove  that  PT  is  the  tableau  corresponding  to  an  . . . 0,2  a\. 
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6.  [ M26 ] (M.  P.  Schiitzenberger.)  Let  n be  an  involution  with  k fixed  points.  Prove 
that  the  tableau  corresponding  to  n r,  in  the  proof  of  Corollary  B,  has  exactly  k columns 
of  odd  length. 

7.  [M20]  (C.  Schensted.)  Let  P be  the  tableau  corresponding  to  the  permutation 
ai  a-2  ■ ■ . an.  Prove  that  the  number  of  columns  in  P is  the  longest  length  c of  an 
increasing  subsequence  ail  < a12  < ■ ■ ■ < a,ic,  where  i\  < i2  < ■ ■ ■ < ic;  the  number  of 
rows  in  P is  the  longest  length  r of  a decreasing  subsequence  a31  > a32  > ■ ■ ■ > aJr , 
where  ji  < j2  < ■ ■ ■ < jT- 

8.  [M18]  (P.  Erdos,  G.  Szekeres.)  Prove  that  any  permutation  containing  more  than 
n2  elements  has  a monotonic  subsequence  of  length  greater  than  n;  but  there  are 
permutations  of  n 2 elements  with  no  monotonic  subsequences  of  length  greater  than  n. 
[Hint:  See  the  previous  exercise.] 

9.  [M2J,  ] Continuing  exercise  8,  find  a “simple”  formula  for  the  exact  number  of 
permutations  of  {1,2, ...  ,n2}  that  have  no  monotonic  subsequences  of  length  greater 
than  n. 

10.  [M20]  Prove  that  P is  a tableau  when  Algorithm  S terminates,  if  it  was  a tableau 
initially. 

11.  [20]  Given  only  the  values  of  r and  s after  Algorithm  S terminates,  is  it  possible 
to  restore  P to  its  original  condition? 

12.  [ M24 ] How  many  times  is  step  S3  performed,  if  Algorithm  S is  used  repeatedly 
to  delete  all  elements  of  a tableau  P whose  shape  is  (m,  ri2, . . . , nm)?  What  is  the 
minimum  of  this  quantity,  taken  over  all  shapes  with  m + n2  + • • • + nm  = nl 

13.  [M28]  Prove  Theorem  C. 

14.  [M43]  Find  a more  direct  proof  of  Theorem  D,  part  (c). 

15.  [M20]  How  many  permutations  of  the  multiset  {l -a,  m-b,  n-c}  have  the  property 
that,  as  we  read  the  permutation  from  left  to  right,  the  number  of  c’s  never  exceeds  the 
number  of  6’s,  and  the  number  of  b’s  never  exceeds  the  number  of  a’ s?  (For  example, 
aabcabbcaca  is  such  a permutation.) 

16.  [M08]  In  how  many  ways  can  the  partial  ordering  represented  by  (39)  be  sorted 
topologically? 

17.  [HM25]  Let 

g(x  !,x2,...,xn;  y ) = xi  A(xi  +y,x2,  ...,x„)+x2  A(xi,  x2+y, . . ,,x„) 

H 1 - xn  A(xi,x2, . . .,xn+y). 

Prove  that 

tf(xi , X2 , . . . , Xn , y)  (x\  T X2  T • • • T Xn  d-  (2)  I/)  A {x\  ,X2,  ■ ■ . , Xyi) . 

[Hint:  The  polynomial  g is  homogeneous  (all  terms  have  the  same  total  degree);  and 
it  is  antisymmetric  in  the  x’s  (interchanging  x%  and  x3  changes  the  sign  of  g).] 

18.  [ HM30 ] Generalizing  exercise  17,  evaluate  the  sum 

X\  A (xiTy,  x2, . . . ,xn)  T*T2  A (xx,  x2+y, . . . ,Xn)  T * • • T xn  A(x  1 ,X2, . . . ,XnTy), 


when  m > 0. 
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19.  [M40]  Find  a formula  for  the  number  of  ways  to  fill  an  array  that  is  like  a tableau 
but  with  two  boxes  removed  at  the  left  of  row  1;  for  example, 


rii  — 2 boxes 
n 2 boxes 
«3  boxes 


is  such  a shape.  (The  rows  and  columns  are  to  be  in  increasing  order,  as  in  ordinary 
tableaux.) 

In  other  words,  how  many  tableaux  of  shape  (ni,  ri2, . . . , nm)  on  the  elements 
{1, 2, . . . ,ni+  • • • +nm}  have  both  of  the  elements  1 and  2 in  the  first  row? 

► 20.  [M24]  Prove  that  the  number  of  ways  to  label  the  nodes  of  a given  tree  with 
the  elements  (1,2,  ...,n},  such  that  the  label  of  each  node  is  less  than  that  of  its 
descendants,  is  n!  divided  by  the  product  of  the  subtree  sizes  (the  number  of  nodes  in 
each  subtree).  For  example,  the  number  of  ways  to  label  the  nodes  of 


is  11!/(11  - 4-  l-  5-  l-  2-  31-lll)  = 10-9-8-7-6.  (Compare  with  Theorem  H.) 

21.  [HM31]  (R.  M.  Thrall.)  Let  n\  > ri2  > • ■ ■ > nm  specify  the  shape  of  a “shifted 
tableau”  where  row  i + 1 starts  one  position  to  the  right  of  row  i;  for  example,  a shifted 
tableau  of  shape  (7,  5, 4, 1)  has  the  form  of  the  diagram 


5 

11 

00 

7_ 

_5_ 

4_ 

1 

9 

6 

5 

3 

2 

V 

5 

4 

2 

1 

v 

1 

Prove  that  the  number  of  ways  to  put  the  integers  1,2 , . . . , n = rii+ri2+  • • • +nm  into 
shifted  tableaux  of  shape  (rti , ri2, . . . ,nm),  so  that  rows  and  columns  are  in  increasing 
order,  is  n!  divided  by  the  product  of  the  “generalized  hook  lengths”;  a generalized 
hook  of  length  11,  corresponding  to  the  cell  in  row  1 column  2,  has  been  shaded  in 
the  diagram  above.  (Hooks  in  the  “inverted  staircase”  portion  of  the  array,  at  the  left, 
have  a U-shape,  tilted  90°,  instead  of  an  L-shape.)  Thus  there  are 

17!/(12  •11-8-7-5-4-1-9-6-5-3-2-5-4-211) 

ways  to  fill  the  shape  with  rows  and  columns  in  increasing  order. 

22.  [ M39 ] In  how  many  ways  can  an  array  of  shape  (m, 712, . . . , nm)  be  filled  with 
elements  from  the  set  {1,2,...,  TV)  with  repetitions  allowed,  so  that  the  rows  are 
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nondecreasing  and  the  columns  are  strictly  increasing?  For  example,  the  simple  m- 
rowed  shape  (1, 1, . . . , 1)  can  be  filled  in  (^)  ways;  the  1-rowed  shape  (m)  can  be  filled 
in  (m+^_1)  ways;  the  small  square  shape  (2,2)  in  | (JV+1)  (/[)  ways. 

► 23.  [HM30]  (D.  Andre.)  In  how  many  ways,  A„,  can  the  numbers  {1,2,  ...,n}  be 
placed  into  the  array  of  n cells 


in  such  a way  that  the  rows  and  columns  are  in  increasing  order?  Find  the  generating 
function  g(z)  = Anzn/n\. 

24.  [M28]  Prove  that 


E 


91  H l-9n=t 

0<q1,...,Q„<m 


© -CK ->* 

»»“ 


[Hints:  Prove  that  A(fci+n— 1, . . . , kn)  = A(m— kn+n— 1, . . . , m — fcj);  decompose  an 
n x (m  — n + 1)  tableau  in  a fashion  analogous  to  (38);  and  manipulate  the  sum  as  in 
the  derivation  of  (36).] 

25.  [M20]  Why  is  (42)  the  generating  function  for  involutions? 

26.  [HM21  ] Evaluate  x*  exp(— 2x2/ \fn ) dx  when  t is  a nonnegative  integer. 

27.  [M24]  Let  Q be  a Young  tableau  on  {1, 2, . . . , n};  let  the  element  i be  in  row  r, 
and  column  a.  We  say  that  i is  “above”  j when  r,  < r3 . 

a)  Prove  that,  for  1 < i < n,  i is  above  i + 1 if  and  only  if  Cj  > °i+ 1- 

b)  Given  that  Q is  such  that  (P,  Q)  corresponds  to  the  permutation 


( 


1 

ai 


2 

0,2 


prove  that  i is  above  i + 1 if  and  only  if  at  > Oj+j.  (Therefore  we  can  determine 
the  number  of  runs  in  the  permutation,  knowing  only  Q.  This  result  is  due  to 
M.  P.  Schiitzenberger.) 

c)  Prove  that,  for  1 < i < n,  i is  above  i + 1 in  Q if  and  only  if  i + 1 is  above  i in  Qs . 

28.  [M43]  Prove  that  the  average  length  of  the  longest  increasing  subsequence  of  a 
random  permutation  of  {1, 2, . . . , n)  is  asymptotically  2 yjn.  (This  is  the  average  length 
of  row  1 in  the  correspondence  of  Theorem  A.) 

29.  [ HM25 ] Prove  that  a random  permutation  of  n elements  has  an  increasing  sub- 
sequence of  length  > l with  probability  < (") //!.  This  probability  is  0(1/ y/n)  when 
l = esjn  + 0(1),  and  0(exp(— c^/n ))  when  l = 3 y/n,  c = 6 In  3 — 6. 

30.  [M41]  (M.  P.  Schiitzenberger.)  Show  that  the  operation  of  going  from  P to  Ps  is 
a special  case  of  an  operation  applicable  in  connection  with  any  finite  partially  ordered 
set,  not  merely  a tableau:  Label  the  elements  of  a partially  ordered  set  with  the  integers 
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{1, 2, . . . , n}  in  such  a way  that  the  partial  order  is  consistent  with  the  labeling.  Find 
a dual  labeling  analogous  to  (26),  by  successively  deleting  the  labels  1,  2,  ...  while 
moving  the  other  labels  in  a fashion  analogous  to  Algorithm  S and  placing  1,  2,  ... 
in  the  vacated  places.  Show  that  this  operation,  when  repeated  on  the  dual  labeling 
in  reverse  numerical  order,  yields  the  original  labeling;  and  explore  other  properties  of 
the  operation. 

31.  [HM30]  Let  xn  be  the  number  of  ways  to  place  n mutually  nonattacking  rooks  on 
an  n x n chessboard,  where  each  arrangement  is  unchanged  by  reflection  about  both 
diagonals.  Thus,  X4  = 6.  (Involutions  are  required  to  be  symmetrical  about  only  one 
diagonal.  Exercise  5.1.3-19  considers  a related  problem.)  Find  the  asymptotic  behavior 
of  x„. 

32.  [HM21]  Prove  that  the  involution  number  tn  is  the  expected  value  of  Xn , when 
X is  a normal  deviate  with  mean  1 and  variance  1. 

33.  [M25]  (O.  H.  Mitchell,  1881.)  True  or  false:  A(oi,  02, . . . ,am)/A(l,  2, . . . , m)  is 
an  integer  when  al5  a 2 , . . . , am  are  integers. 

34.  [ 25 ] (T.  Nakayama,  1940.)  Prove  that  if  a tableau  shape  contains  a hook  of  length 
ab,  it  contains  a hook  of  length  a. 

► 35.  [30]  (A.  P.  Hillman  and  R.  M.  Grassl,  1976.)  An  arrangement  of  nonnegative 
integers  ptj  in  a tableau  shape  is  called  a plane  partition  of  rn  if  J")  pj1  = m and 

Pil  > ■■■  > Pini,  Pij  > •••  > Pn'p,  for  1 < i < n'x,  1 < j < nlt 

when  there  are  nt  cells  in  row  i and  n'j  cells  in  column  j.  It  is  called  a reverse  plane 
partition  if  instead 

pn  < ■■■  < Pim,  Pij  <■■■  < Pn 'j,  for  1 < i < n'i,  1 < j < nx. 

Consider  the  following  algorithm,  which  operates  on  reverse  plane  partitions  of  a given 
shape  and  constructs  another  array  of  numbers  qij  having  the  same  shape: 

Gl.  [Initialize.]  Set  qtj  <—  0 for  1 < j < nt  and  1 < i < n[.  Then  set  j ■<—  1. 

G2.  [Find  nonzero  cell.]  If  pn>j  > 0,  set  i •<—  ?i),  k <—  j , and  go  on  to  step  G3. 
Otherwise  if  j < n\ , increase  j by  1 and  repeat  this  step.  Otherwise  stop  (the 
p array  is  now  zero). 

G3.  [Decrease  p.]  Decrease  pik  by  1. 

G4.  [Move  up  or  right.]  If  i > 1 and  p(*_  1) *,  > ptk,  decrease  i by  1 and  return 
to  G3.  Otherwise  if  k < ni,  increase  A:  by  1 and  return  to  G3. 

G5.  [Increase  q.]  Increase  qij  by  1 and  return  to  G2.  | 

Prove  that  this  construction  defines  a one-to-one  correspondence  between  reverse  plane 
partitions  of  m and  solutions  of  the  equation 

m = ) ) hij  <jij  , 

where  the  numbers  hij  are  the  hook  lengths  of  the  shape,  by  designing  an  algorithm 
that  recomputes  the  p’s  from  the  q’s. 

36.  [HM27]  (R.  P.  Stanley,  1971.)  (a)  Prove  that  the  number  of  reverse  plane  par- 
titions of  m in  a given  shape  is  [zm]  1/  fl(l  — zhii ),  where  the  numbers  hij  are  the 
hook  lengths  of  the  shape,  (b)  Derive  Theorem  H from  this  result.  [Hint:  What  is  the 
asymptotic  number  of  partitions  asm->  00?] 
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37.  [ M20 ] (P.  A.  MacMahon,  1912.)  What  is  the  generating  function  for  all  plane 
partitions?  (The  coefficient  of  zm  should  be  the  total  number  of  plane  partitions  of  m 
when  the  tableau  shape  is  unbounded.) 

38.  [M30]  (Greene,  Nijenhuis,  and  Wilf,  1979.)  We  can  construct  a directed  acyclic 
graph  on  the  cells  T of  any  given  tableau  shape  by  letting  arcs  run  from  each  cell  to 
the  other  cells  in  its  hook;  the  out-degree  of  cell  (i,  j)  will  then  be  dtJ  = hij  — 1,  where 
hij  is  the  hook  length.  Suppose  we  generate  a random  path  in  this  digraph  by  choosing 
a random  starting  cell  (i,j)  and  choosing  further  arcs  at  random,  until  coming  to  a 
corner  cell  from  which  there  is  no  exit.  Each  random  choice  is  made  uniformly. 

a)  Let  (a, b)  be  a corner  cell  of  T,  and  let  I = (i0, . . . ,ik}  and  J = { j0 , . . . , j,}  be 
sets  of  rows  and  columns  with  io  <■■■<  ik  = a and  jo  <■■■<  ji  = b.  The 
digraph  contains  { ^l)  paths  whose  row  and  column  sets  are  respectively  I and  J; 
let  P(I,  J)  be  the  probability  that  the  random  path  is  one  of  these.  Prove  that 

= 1/ (n  di0b  ■ ■ ■ dik  lb  daj0  . . . daj ,_x ),  where  n = |T|. 

b)  Let  f(T)  = n\/\\hij.  Prove  that  the  random  path  ends  at  corner  (a, 6)  with 
probability  f(T  \ {(a,  b)})/f(T). 

c)  Show  that  the  result  of  (b)  proves  Theorem  H and  also  gives  us  a way  to  generate 
a random  tableau  of  shape  T,  with  all  f(T)  tableaux  equally  likely. 


39.  [M38]  (I.  M.  Pak  and  A.  V.  Stoyanovskii,  1992.)  Let  P be  an  array  of  shape 
(ni, . . . , rim)  that  has  been  filled  with  any  permutation  of  the  integers  {1, . . . , n},  where 

n — rii hnm.  The  following  procedure,  which  is  analogous  to  the  “siftup”  algorithm 

in  Section  5.2.3,  can  be  used  to  convert  P to  a tableau.  It  also  defines  an  array  Q of 
the  same  shape,  which  can  be  used  to  provide  a combinatorial  proof  of  Theorem  H. 

PI.  [Loop  on  (i,j).]  Perform  steps  P2  and  P3  for  all  cells  ( i,j ) of  the  array,  in 
reverse  lexicographic  order  (that  is,  from  bottom  to  top,  and  from  right  to 
left  in  each  row);  then  stop. 


P2.  [Fix  P at  (i,j).]  Set  K <—  Pt]  and  perform  Algorithm  S'  (see  below). 

P3.  [Adjust  Q .]  Set  Qik  <—  Qi(k+i)  + 1 for  j < k < s,  and  set  Qis  i — r.  | 

Here  Algorithm  S'  is  the  same  as  Schiitzenberger’s  Algorithm  S,  except  that  steps  SI 
and  S2  are  generalized  slightly: 

Si'.  [Initialize.]  Set  r ■<—  i,  s i—  j . 

S2'.  [Done?]  If  K < P(r+1)s  and  K < Pr(s+1),  set  Prs  K and  terminate. 

(Algorithm  S is  essentially  the  special  case  i = l,  j = 1,  K = oo.) 

For  example,  Algorithm  P straightens  out  one  particular  array  of  shape  (3, 3,  2) 
in  the  following  way,  if  we  view  the  contents  of  arrays  P and  Q at  the  beginning  of 
step  P2,  with  Pij  in  boldface  type: 
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The  final  result  is 


1 

3 

4 

2 

5 

8 

6 

7 

1 

-2 

-1 

0 

-1 

0 

1 

0 

a)  If  P is  simply  alxn  array,  Algorithm  P sorts  it  into 
the  Q array  will  contain  in  that  case. 

b)  Answer  the  same  question  if  P is  n X 1 instead  of  1 X n. 

c)  Prove  that,  in  general,  we  will  have 


Explain  what 


bij  < Q ij  ^ f'ij , 

where  bij  is  the  number  of  cells  below  (i,j)  and  rZJ  is  the  number  of  cells  to  the 
right.  Thus,  the  number  of  possible  values  for  Q,j  is  exactly  hij,  the  size  of  the 
(i,j) th  hook. 

d)  Theorem  H will  be  proved  constructively  if  we  can  show  that  Algorithm  P defines 
a one-to-one  correspondence  between  the  n!  ways  to  fill  the  original  shape  and  the 
pairs  of  output  arrays  ( P,Q ),  where  P is  a tableau  and  the  elements  of  Q satisfy 
the  condition  of  part  (c).  Therefore  we  want  to  find  an  inverse  of  Algorithm  P.  For 
what  initial  permutations  does  Algorithm  P produce  the  2x2  array  Q = (°  ~x)? 

e)  What  initial  permutation  does  Algorithm  P convert  into  the  arrays 


1 

3 

5 

7 

11 

15 

2 

6 

8 

14 

4 

9 

13 

10 

12 

16 

-2 

-3 

-1 

-1 

1 

0 

co 

-2 

-1 

0 

0 

-1 

0 

-1 

0 

0 

f)  Design  an  algorithm  that  inverts  Algorithm  P,  given  any  pair  of  arrays  ( P , Q) 
such  that  P is  a tableau  and  Q satisfies  the  condition  of  part  (c) . [Hint:  Construct 
an  oriented  tree  whose  vertices  are  the  cells  {i,  j),  with  arcs 


(*> 3)  ->  ~ 1)  if  Pi(j- i)  > P(i-i)j\ 

^ fjj)  if  Pi(j-l)  ^ — 


In  the  example  of  part  (e)  we  have  the  tree 


A'  A 


* * 


The  paths  of  this  tree  hold  the  key  to  inverting  Algorithm  P.] 

40.  [HM43]  Suppose  a random  Young  tableau  has  been  constructed  by  successively 
placing  the  numbers  1,  2,  . . . , n in  such  a way  that  each  possibility  is  equally  likely 
when  a new  number  is  placed.  For  example,  the  tableau  (1)  would  be  obtained  with 
probability  I.l.l.I.I.I.l.I.I.i.i.I.I.I.I  using  this  procedure. 

Prove  that,  with  high  probability,  the  resulting  shape  (m, ri2,  • • ■ , nm)  will  have 
m ~ \/6 n and  •Jk  + yTifc+i  ~ y/m  for  0 < k < m. 
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41.  [25]  ( Disorder  in  a library.)  Casual  users  of  a library  often  put  books  back  on  the 
shelves  in  the  wrong  place.  One  way  to  measure  the  amount  of  disorder  present  in  a 
library  is  to  consider  the  minimum  number  of  times  we  would  have  to  take  a book  out 
of  one  place  and  insert  it  in  another,  before  all  books  are  restored  to  the  correct  order. 

Thus  let  7r  = aj  a2  . . . a„  be  a permutation  of  {1, 2, . . . , n}.  A “deletion-insertion 
operation”  changes  n to 

ai  ■ ■ ■ fli- 1 “t+i  • • • a,j  ai  aj+i  . . . an  or  ai  . . . aj  o;  aj+i  . . . ai-i  1 . . . an, 

for  some  i and  j.  Let  dis(7r)  be  the  minimum  number  of  deletion-insertion  operations 
that  will  sort  n into  order.  Can  dis(7r)  be  expressed  in  terms  of  simpler  characteristics 
of  7T  ? 

► 42.  [30]  ( Disorder  in  a genome.)  The  DNA  of  Lobelia  fervens  has  genes  occur- 
ring in  the  sequence  gR 9i  9204050306* , where  gR  stands  for  the  left-right  reflection 
of  gr,  the  same  genes  occur  in  tobacco  plants,  but  in  the  order  g i<72<?3 <7495 <76 97 • Show 
that  five  “flip”  operations  on  substrings  are  needed  to  get  from  01020304950607  to 
<77*<h <7294 <789356* ■ (A  flip  takes  a(3y  to  a/3R-y,  when  a,  0,  and  7 are  strings.) 

43.  [35]  Continuing  the  previous  exercise,  show  that  at  most  n + 1 flips  are  needed 
to  sort  any  rearrangement  of  g\g2  ■ ■ ■ gn-  Construct  examples  that  require  n + 1 flips, 
for  all  n > 3. 

44.  [M37]  Show  that  the  average  number  of  flips  required  to  sort  a random  arrange- 
ment of  n genes  is  greater  than  n — Hn,  if  all  2n  n\  genome  rearrangements  are  equally 
likely. 
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5.2.  INTERNAL  SORTING 

Let’s  BEGIN  our  discussion  of  good  “sortsmanship”  by  conducting  a little  ex- 
periment. How  would  you  solve  the  following  programming  problem? 

“Memory  locations  R+l,  R+2,  R+3,  R+4,  and  R+5  contain  five  numbers. 
Write  a computer  program  that  rearranges  these  numbers,  if  necessary, 
so  that  they  are  in  ascending  order.” 

(If  you  already  are  familiar  with  some  sorting  methods,  please  do  your  best  to 
forget  about  them  momentarily;  imagine  that  you  are  attacking  this  problem  for 
the  first  time,  without  any  prior  knowledge  of  how  to  proceed.) 

Before  reading  any  further,  you  are  requested  to  construct  a solution  to  this 
problem. 


The  time  you  spent  working  on  the  challenge  problem  will  pay  dividends 
as  you  continue  to  read  this  chapter.  Chances  are  your  solution  is  one  of  the 
following  types: 

A.  An  insertion  sort.  The  items  are  considered  one  at  a time,  and  each  new 
item  is  inserted  into  the  appropriate  position  relative  to  the  previously-sorted 
items.  (This  is  the  way  many  bridge  players  sort  their  hands,  picking  up  one 
card  at  a time.) 

B.  An  exchange  sort.  If  two  items  are  found  to  be  out  of  order,  they  are 
interchanged.  This  process  is  repeated  until  no  more  exchanges  are  necessary. 

C.  A selection  sort.  First  the  smallest  (or  perhaps  the  largest)  item  is  lo- 
cated, and  it  is  somehow  separated  from  the  rest;  then  the  next  smallest  (or  next 
largest)  is  selected,  and  so  on. 

D.  An  enumeration  sort.  Each  item  is  compared  with  each  of  the  others;  an 
item’s  final  position  is  determined  by  the  number  of  keys  that  it  exceeds. 

E.  A special-purpose  sort,  which  works  nicely  for  sorting  five  elements  as 
stated  in  the  problem,  but  does  not  readily  generalize  to  larger  numbers  of  items. 

F.  A lazy  attitude,  with  which  you  ignored  the  suggestion  above  and  decided 
not  to  solve  the  problem  at  all.  Sorry,  by  now  you  have  read  too  far  and  you 
have  lost  your  chance. 

G.  A new,  super  sorting  technique  that  is  a definite  improvement  over  known 
methods.  (Please  communicate  this  to  the  author  at  once.) 

If  the  problem  had  been  posed  for,  say,  1000  items,  not  merely  5,  you  might 
also  have  discovered  some  of  the  more  subtle  techniques  that  will  be  mentioned 
later.  At  any  rate,  when  attacking  a new  problem  it  is  often  wise  to  find  some 
fairly  obvious  procedure  that  works,  and  then  try  to  improve  upon  it.  Cases  A,  B, 
and  C above  lead  to  important  classes  of  sorting  techniques  that  are  refinements 
of  the  simple  ideas  stated. 

Many  different  sorting  algorithms  have  been  invented,  and  we  will  be  dis- 
cussing about  25  of  them  in  this  book.  This  rather  alarming  number  of  methods 
is  actually  only  a fraction  of  the  algorithms  that  have  been  devised  so  far; 
many  techniques  that  are  now  obsolete  will  be  omitted  from  our  discussion,  or 
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mentioned  only  briefly.  Why  are  there  so  many  sorting  methods?  For  computer 
programming,  this  is  a special  case  of  the  question,  “Why  are  there  so  many  x 
methods?” , where  x ranges  over  the  set  of  problems;  and  the  answer  is  that  each 
method  has  its  own  advantages  and  disadvantages,  so  that  it  outperforms  the 
others  on  some  configurations  of  data  and  hardware.  Unfortunately,  there  is  no 
known  “best”  way  to  sort;  there  are  many  best  methods,  depending  on  what 
is  to  be  sorted  on  what  machine  for  what  purpose.  In  the  words  of  Rudyard 
Kipling,  “There  are  nine  and  sixty  ways  of  constructing  tribal  lays,  and  every 
single  one  of  them  is  right.” 

It  is  a good  idea  to  learn  the  characteristics  of  each  sorting  method,  so  that 
an  intelligent  choice  can  be  made  for  particular  applications.  Fortunately,  it  is 
not  a formidable  task  to  learn  these  algorithms,  since  they  are  interrelated  in 
interesting  ways. 

At  the  beginning  of  this  chapter  we  defined  the  basic  terminology  and 
notation  to  be  used  in  our  study  of  sorting:  The  records 

Ri,R2,  ■ ■ . ,Rn  (i) 

are  supposed  to  be  sorted  into  nondecreasing  order  of  their  keys  K\,K2,  ■ ■ . , Kn, 
essentially  by  discovering  a permutation  p(l)p(2) . . .p(N)  such  that 

Rp ( i ) ^ Rp( 2)  — ■ ■ ■ — kp[N).  (2) 

In  the  present  section  we  are  concerned  with  internal  sorting,  when  the  number 
of  records  to  be  sorted  is  small  enough  that  the  entire  process  can  be  performed 
in  a computer’s  high-speed  memory. 

In  some  cases  we  will  want  the  records  to  be  physically  rearranged  in  memory 
so  that  their  keys  are  in  order,  while  in  other  cases  it  may  be  sufficient  merely 
to  have  an  auxiliary  table  of  some  sort  that  specifies  the  permutation.  If  the 
records  and/or  the  keys  each  take  up  quite  a few  words  of  computer  memory, 
it  is  often  better  to  make  up  a new  table  of  link  addresses  that  point  to  the 
records,  and  to  manipulate  these  link  addresses  instead  of  moving  the  bulky 
records  around.  This  method  is  called  address  table  sorting  (see  Fig.  6).  If  the 
key  is  short  but  the  satellite  information  of  the  records  is  long,  the  key  may  be 
placed  with  the  link  addresses  for  greater  speed;  this  is  called  keysorting.  Other 
sorting  schemes  utilize  an  auxiliary  link  field  that  is  included  in  each  record; 
these  links  are  manipulated  in  such  a way  that,  in  the  final  result,  the  records 
are  linked  together  to  form  a straight  linear  list,  with  each  link  pointing  to  the 
following  record.  This  is  called  list  sorting  (see  Fig.  7). 

After  sorting  with  an  address  table  or  list  method,  the  records  can  be  re- 
arranged into  increasing  order  as  desired.  Exercises  10  and  12  discuss  interesting 
ways  to  do  this,  requiring  only  enough  additional  memory  space  to  hold  one 
record;  alternatively,  we  can  simply  move  the  records  into  a new  area  capable 
of  holding  all  records.  The  latter  method  is  usually  about  twice  as  fast  as  the 
former,  but  it  demands  nearly  twice  as  much  storage  space.  Many  applications 
can  get  by  without  moving  the  records  at  all,  since  the  link  fields  are  often 
adequate  for  all  of  the  subsequent  processing. 
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Ri  R2  R3 


Fig.  6.  Address  table  sorting. 


R\  R2  R3 


Fig.  7.  List  sorting. 

All  of  the  sorting  methods  that  we  shall  examine  in  depth  will  be  illustrated 
in  four  ways,  by  means  of 

a)  an  English-language  description  of  the  algorithm, 

b)  a flow  diagram, 

c)  a MIX  program,  and 

d)  an  example  of  the  sorting  method  applied  to  a certain  set  of  16  numbers. 

For  convenience,  the  MIX  programs  will  usually  assume  that  the  key  is  numeric 
and  that  it  fits  in  a single  word;  sometimes  we  will  even  restrict  the  key  to  part 
of  a word.  The  order  relation  < will  be  ordinary  arithmetic  order;  and  the  record 
will  consist  of  the  key  alone,  with  no  satellite  information.  These  assumptions 
make  the  programs  shorter  and  easier  to  understand,  and  a reader  should  find 
it  fairly  easy  to  adapt  any  of  the  programs  to  the  general  case  by  using  address 
table  sorting  or  list  sorting.  An  analysis  of  the  running  time  of  each  sorting 
algorithm  will  be  given  with  the  MIX  programs. 

Sorting  by  counting.  As  a simple  example  of  the  way  in  which  we  shall  study 
internal  sorting  methods,  let  us  consider  the  “counting”  idea  mentioned  near 
the  beginning  of  this  section.  This  simple  method  is  based  on  the  idea  that  the 
jth  key  in  the  final  sorted  sequence  is  greater  than  exactly  j — 1 of  the  other 
keys.  Putting  this  another  way,  if  we  know  that  a certain  key  exceeds  exactly 
27  others,  and  if  no  two  keys  are  equal,  the  corresponding  record  should  go  into 


76 


SORTING 


5.2 


position  28  after  sorting.  So  the  idea  is  to  compare  every  pair  of  keys,  counting 
how  many  are  less  than  each  particular  one. 

The  obvious  way  to  do  the  comparisons  is  to 

((compare  Kj  with  Ki)  for  1 < j < TV)  for  1 < * < A; 

but  it  is  easy  to  see  that  more  than  half  of  these  comparisons  are  redundant, 
since  it  is  unnecessary  to  compare  a key  with  itself,  and  it  is  unnecessary  to 
compare  Ka  with  Kf,  and  later  to  compare  Kb  with  Ka.  We  need  merely  to 

((compare  Kj  with  Ki)  for  1 < j < i)  for  1 < i < N. 

Hence  we  are  led  to  the  following  algorithm. 

Algorithm  C ( Comparison  counting).  This  algorithm  sorts  i?i, . . . , R.n  on  the 
keys  Ki , . . . , Ajv  by  maintaining  an  auxiliary  table  COUNT  [1]  , . . . , COUNT  [A]  to 
count  the  number  of  keys  less  than  a given  key.  After  the  conclusion  of  the 
algorithm,  COUNT  [ j]  + 1 will  specify  the  final  position  of  record  Rj . 

Cl.  [Clear  COUNTs.]  Set  COUNT  [1]  through  COUNT  [A]  to  zero. 

C2.  [Loop  on  *.]  Perform  step  C3,  for  * = A,  A — 1,  . . . , 2;  then  terminate  the 
algorithm. 

C3.  [Loop  on  j]  Perform  step  C4,  for  j = i — 1,  i — 2,  . . . , 1. 

C4.  [Compare  Kt  : Kj.}  If  A,  < Kj,  increase  COUNT  [j]  by  1;  otherwise  increase 
COUNT  [i]  by  1.  | 

Note  that  this  algorithm  involves  no  movement  of  records.  It  is  similar  to 
an  address  table  sort,  since  the  COUNT  table  specifies  the  final  arrangement  of 
records;  but  it  is  somewhat  different  because  COUNT  [j]  tells  us  where  to  move 
Rj,  instead  of  indicating  which  record  should  be  moved  into  the  place  of  Rj. 
(Thus  the  COUNT  table  specifies  the  inverse  of  the  permutation  p(  1) . . ,p(A);  see 
Section  5.1.1.) 

Table  1 illustrates  the  typical  behavior  of  comparison  counting,  by  applying 
it  to  16  numbers  that  were  chosen  at  random  by  the  author  on  March  19,  1963. 
The  same  16  numbers  will  be  used  to  illustrate  almost  all  of  the  other  methods 
that  we  shall  discuss  later. 

In  our  discussion  preceding  this  algorithm  we  blithely  assumed  that  no  two 
keys  were  equal.  This  was  a potentially  dangerous  assumption,  for  if  equal 
keys  corresponded  to  equal  COUNTS  the  final  rearrangement  of  records  would  be 
quite  complicated.  Fortunately,  however,  Algorithm  C gives  the  correct  result 
no  matter  how  many  equal  keys  are  present;  see  exercise  2. 

Program  C ( Comparison  counting).  The  following  MIX  implementation  of 
Algorithm  C assumes  that  Rj  is  stored  in  location  INPUT  + j,  and  COUNT  [j] 


in 

rX 

location  COUNT  + j,  for 
- COUNT  [i]. 

1 < j < A;  rll  = *;  rI2  = j;  rA  = Ki  = Rp 

01 

START  ENT1  N 

1 Cl.  Clear  COUNTS. 

02 

STZ  COUNT, 1 

A COUNT  [i]  <-  0. 

03 

DEC1  1 

A 

04 

J1P  *-2 

A A > i > 0. 
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Table  1 


SORTING  BY  COUNTING  (ALGORITHM  C) 


KEYS: 

503  087  512  061  908  170  897  275  653  426  154  509  612  677  765  703 

COUNT  (init.): 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

COUNT  ( i = N ): 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

0 

1 

12 

COUNT  (i  = TV-  1): 

0 

0 

0 

0 

2 

0 

2 

0 

0 

0 

0 

0 

0 

0 

13 

12 

COUNT  ( i = TV- 2): 

0 

0 

0 

0 

3 

0 

3 

0 

0 

0 

0 

0 

0 

11 

13 

12 

COUNT  (i  = TV  — 3): 

0 

0 

0 

0 

4 

0 

4 

0 

1 

0 

0 

0 

9 

11 

13 

12 

COUNT  ( i = TV-4): 

0 

0 

1 

0 

5 

0 

5 

0 

2 

0 

0 

7 

9 

11 

13 

12 

COUNT  ( i = AT-5): 

1 

0 

2 

0 

6 

1 

6 

1 

3 

1 

2 

7 

9 

11 

13 

12 

COUNT  (i  = 2): 

6 

1 

8 

0 

15 

3 

14 

4 

10 

5 

2 

7 

9 

11 

13 

12 

Fig.  8.  Algorithm  C:  Comparison  counting. 


05 

ENT1 

N 

1 

C2.  Loop  on  i. 

06 

JMP 

IF 

1 

07 

2H 

LDA 

INPUT, 1 

TV-  1 

08 

LDX 

COUNT, 1 

TV-  1 

09 

3H 

CMPA 

INPUT, 2 

A 

C4.  Compare  Ki  : Ki. 

10 

JGE 

4F 

A 

Jump  if  Ki  > Kj. 

11 

LD3 

COUNT, 2 

B 

COUNT [j] 

12 

INC3 

1 

B 

+1 

13 

ST3 

COUNT, 2 

B 

— > COUNT  [j]. 

U 

JMP 

5F 

B 

15 

4H 

INCX 

1 

1 

to 

COUNT  [i]  <-  COUNT  [i]  + 1. 

16 

5H 

DEC2 

1 

A 

C3.  Loop  on  i. 

17 

J2P 

3B 

A 

18 

STX 

COUNT, 1 

TV-1 

19 

DEC1 

1 

AT-  1 

20 

1H 

ENT2 

-1,1 

N 

TV  > i > j > 0. 

21 

J2P 

2B 

N 

1 

The  running  time  of  this  program  is  137V  + 6A  + 5B  - 4 units,  where  TV  is 
the  number  of  records;  A is  the  number  of  choices  of  two  things  from  a set  of 
N objects,  namely  (^)  = (TV2  — N)/2;  and  B is  the  number  of  pairs  of  indices 
for  which  j < i and  Kj  > K{.  Thus,  B is  the  number  of  inversions  of  the 
permutation  K\  ...  K^;  this  is  the  quantity  that  was  analyzed  extensively  in 
Section  5.1.1,  where  we  found  in  Eqs.  5.1.1-(i2)  and  5.1.1-(i3)  that,  for  unequal 
keys  in  random  order,  we  have 

B = (min  0,  ave  (7V2-7V)/4,  max  ( N2-N)/2 , dev  y/N(N  - 1)(7 V + 2.5)/6). 
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Hence  Program  C requires  between  37V 2 + 107V  — 4 and  5.57V2  + 7.57V  — 4 units 
of  time,  and  the  average  running  time  lies  halfway  between  these  two  extremes. 
For  example,  the  data  in  Table  1 has  TV  = 16,  A = 120,  B = 41,  so  Program  C 
will  sort  it  in  1129u.  See  exercise  5 for  a modification  of  Program  C that  has 
slightly  different  timing  characteristics. 

The  factor  TV2  that  dominates  this  running  time  shows  that  Algorithm  C 
is  not  an  efficient  way  to  sort  when  TV  is  large;  doubling  the  number  of  records 
increases  the  running  time  fourfold.  Since  the  method  requires  a comparison  of 
all  distinct  pairs  of  keys  ( Ki,Kj ),  there  is  no  apparent  way  to  get  rid  of  the 
dependence  on  TV2,  although  we  will  see  later  in  this  chapter  that  the  worst-case 
running  time  for  sorting  can  be  reduced  to  order  TV  log  TV  using  other  techniques. 
Our  main  interest  in  Algorithm  C is  its  simplicity,  not  its  speed.  Algorithm  C 
serves  as  an  example  of  the  style  in  which  we  will  be  describing  more  complex 
(and  more  efficient)  methods. 

There  is  another  way  to  sort  by  counting  that  is  quite  important  from  the 
standpoint  of  efficiency;  it  is  primarily  applicable  in  the  case  that  many  equal 
keys  are  present,  and  when  all  keys  fall  into  the  range  u < Kj  < v,  where  (v  — u ) 
is  small.  These  assumptions  appear  to  be  quite  restrictive,  but  in  fact  we  shall 
see  quite  a few  applications  of  the  idea.  For  example,  if  we  apply  this  method 
to  the  leading  digits  of  keys  instead  of  applying  it  to  entire  keys,  the  file  will  be 
partially  sorted  and  it  will  be  comparatively  simple  to  complete  the  job. 

In  order  to  understand  the  principles  involved,  suppose  that  all  keys  lie 
between  1 and  100.  In  one  pass  through  the  file  we  can  count  how  many  Is,  2s, 
. . . , 100s  are  present;  and  in  a second  pass  we  can  move  the  records  into  the 
appropriate  place  in  an  output  area.  The  following  algorithm  spells  things  out 
in  complete  detail: 

Algorithm  D ( Distribution  counting).  Assuming  that  all  keys  are  integers  in 
the  range  u < Kj  < v for  1 < j < TV,  this  algorithm  sorts  the  records  i?i , . . . , 
by  making  use  of  an  auxiliary  table  COUNT  [u]  , . . . , COUNT  [-/;]  . At  the  conclusion 
of  the  algorithm  the  records  are  moved  to  an  output  area  Si, ... , Sn  in  the 
desired  order. 

Dl.  [Clear  COUNTS.]  Set  COUNT  [u]  through  COUNT  [u]  all  to  zero. 

D2.  [Loop  on  j.]  Perform  step  D3  for  1 < j < TV;  then  go  to  step  D4. 

D3.  [I  ncrease  COUNT  IKj]  .]  Increase  the  value  of  COUNT  [TV,]  by  1. 

D4.  [Accumulate.]  (At  this  point  COUNT  [i]  is  the  number  of  keys  that  are  equal 
to  i.)  Set  COUNT  [*]  4-  COUNT  [*]  + COUNT [«  - 1] , for  i = u + 1,  u + 2,  . . . , v. 

D5.  [Loop  on  j.]  (At  this  point  COUNT  [?']  is  the  number  of  keys  that  are  less  than 
or  equal  to  i ; in  particular,  COUNT  [u]  = TV.)  Perform  step  D6  for  j = TV, 
TV  — 1,  . . . , 1;  then  terminate  the  algorithm. 

D6.  [Output  Rv\  Set  i 4-  COUNT  [A,] , S,  4-  Rj,  and  COUNT  [A,]  4-  i - 1.  | 

An  example  of  this  algorithm  is  worked  out  in  exercise  6;  a MIX  program  appears 
in  exercise  9.  When  the  range  v — u is  small,  this  sorting  procedure  is  very  fast. 
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Fig.  9.  Algorithm  D:  Distribution  counting. 

Sorting  by  comparison  counting  as  in  Algorithm  C was  first  mentioned  in 
print  by  E.  H.  Friend  [JACM  3 (1956),  152],  although  he  didn’t  claim  it  as  his 
own  invention.  Distribution  sorting  as  in  Algorithm  D was  first  developed  by 
H.  Seward  in  1954  for  use  with  radix  sorting  techniques  that  we  will  discuss 
later  (see  Section  5.2.5);  it  was  also  published  under  the  name  “Mathsort”  by 
W.  Feurzeig,  CACM  3 (1960),  601. 

EXERCISES 

1.  [15]  Would  Algorithm  C still  work  if  i varies  from  2 up  to  N in  step  C2,  instead 
of  from  N down  to  2?  What  if  j varies  from  1 up  to  i - 1 in  step  C3? 

2.  [21]  Show  that  Algorithm  C works  properly  when  equal  keys  are  present.  If 
Kj  = Ki  and  j < i,  does  Rj  come  before  or  after  A,  in  the  final  ordering? 

► 3.  [21]  Would  Algorithm  C still  work  properly  if  the  test  in  step  C4  were  changed 
from  11  Ki  < Kj"  to  “Ki  < Kj"? 

4.  [16]  Write  a MIX  program  that  “finishes”  the  sorting  begun  by  Program  C;  your 
program  should  transfer  the  keys  to  locations  0UTPUT+1  through  OUTPUT+N,  in  ascending 
order.  How  much  time  does  your  program  require? 

5.  [22]  Does  the  following  set  of  changes  improve  Program  C? 

New  line  08a:  INCX  0,2 
Change  line  10:  JGE  5F 
Change  line  14:  DECX  1 
Delete  line  15. 

6.  [18]  Simulate  Algorithm  D by  hand,  showing  intermediate  results  when  the  16 
records  5T,  0C,  5U,  00,  9.,  IN,  8S,  2R,  6A,  4A,  1G,  5L,  6T,  61,  70,  7N  are  being  sorted. 
Here  the  numeric  digit  is  the  key,  and  the  alphabetic  information  is  just  carried  along 
with  the  records. 

7.  [13]  Is  Algorithm  D a stable  sorting  method? 

8.  [15]  Would  Algorithm  D still  work  properly  if  j were  to  vary  from  1 up  to  N in 
step  D5,  instead  of  from  N down  to  1? 

9.  [23]  Write  a MIX  program  for  Algorithm  D,  analogous  to  Program  C and  exercise  4. 
What  is  the  execution  time  of  your  program,  as  a function  of  N and  ( v — u)l 

10.  [25]  Design  an  efficient  algorithm  that  replaces  the  N quantities  (fii, . . . , RN)  by 
(Rp(i)t  • ■ • i R-p{ N) ) i respectively,  given  the  values  of  R\ , . . . , Rn  and  the  permutation 
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P(  1)  • • ■ P(N)  of  {1, . . . , N}.  Try  to  avoid  using  excess  memory  space.  (This  problem 
arises  if  we  wish  to  rearrange  records  in  memory  after  an  address  table  sort,  without 
having  enough  room  to  store  2 N records.) 

11.  [M27]  Write  a MIX  program  for  the  algorithm  of  exercise  10,  and  analyze  its 
efficiency. 

► 12.  [25]  Design  an  efficient  algorithm  suitable  for  rearranging  the  records  R^,...,RN 
into  sorted  order,  after  a list  sort  (Fig.  7)  has  been  completed.  Try  to  avoid  using 
excess  memory  space. 

► 13.  [27]  Algorithm  D requires  space  for  2 N records  Ri,...,Rn  and  Si , . . . , SN.  Show 
that  it  is  possible  to  get  by  with  only  N records  Ri , . . . , RN,  if  a new  unshuffiing 
procedure  is  substituted  for  steps  D5  and  D6.  (Thus  the  problem  is  to  design  an 
algorithm  that  rearranges  Ri , . . . , RN  in  place,  based  on  the  values  of  COUNT  [u]  , . . . , 
COUNT  [v]  after  step  D4,  without  using  additional  memory  space;  this  is  essentially  a 
generalization  of  the  problem  considered  in  exercise  10.) 

5.2.1.  Sorting  by  Insertion 

One  of  the  important  families  of  sorting  techniques  is  based  on  the  “bridge 
player  method  mentioned  near  the  beginning  of  Section  5.2:  Before  examining 
record  Rj,  we  assume  that  the  preceding  records  7?i, . . .,RJ_1  have  already 
been  sorted;  then  we  insert  Rj  into  its  proper  place  among  the  previously  sorted 
records.  Several  interesting  variations  on  this  basic  theme  are  possible. 

Straight  insertion.  The  simplest  insertion  sort  is  the  most  obvious  one. 
Assume  that  1 < j < N and  that  records  R\ , . . . , Rj_  j have  been  rearranged  so 
that 

Ki  < K2  < ■■■  < Kj _j. 

(Remember  that,  throughout  this  chapter,  Kj  denotes  the  key  portion  of  Rj.) 
We  compare  the  new  key  Kj  with  Kj.u  Kj_2,  . . . , in  turn,  until  discovering 
that  Rj  should  be  inserted  between  records  R.t  and  Ri+1;  then  we  move  records 
Ri+ 1,  • ■ • ) Rj- 1 up  one  space  and  put  the  new  record  into  position  i + 1.  It  is 
convenient  to  combine  the  comparison  and  moving  operations,  interleaving  them 
as  shown  in  the  following  algorithm;  since  Rj  “settles  to  its  proper  level”  this 
method  of  sorting  has  often  been  called  the  sifting  or  sinking  technique. 


Fig.  10.  Algorithm  S:  Straight  insertion. 


Algorithm  S ( Straight  insertion  sort).  Records  R \ . . . . , R\  are  rearranged  in 
place;  after  sorting  is  complete,  their  keys  will  be  in  order,  Ki  < ■ • ■ < KN. 
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51.  [Loop  on  j.]  Perform  steps  S2  through  S5  for  j = 2,  3, . . . , JV;  then  terminate 
the  algorithm. 

52.  [Set  up  i,  K,  12.]  Set  i «—  j — 1,  K t—  Kj,  R t—  Rj.  (In  the  following  steps 
we  will  attempt  to  insert  R into  the  correct  position,  by  comparing  K with 
Ki  for  decreasing  values  of  i.) 

53.  [Compare  K : K^.]  If  K > Ki,  go  to  step  S5.  (We  have  found  the  desired 
position  for  record  R.) 

54.  [Move  Ri,  decrease  i]  Set  Ri+X  R{,  then  i <-  i - 1.  If  i > 0,  go  back  to 
step  S3.  (If  i = 0,  K is  the  smallest  key  found  so  far,  so  record  R belongs  in 
position  1.) 

55.  [R  into  Ri+X.]  Set  Ri+1  R.  | 

Table  1 shows  how  our  sixteen  example  numbers  are  sorted  by  Algorithm  S.  This 

method  is  extremely  easy  to  implement  on  a computer;  in  fact  the  following  MIX 

program  is  the  shortest  decent  sorting  routine  in  this  book. 

Table  1 

EXAMPLE  OF  STRAIGHT  INSERTION 

503  : 087 

A 

087  503:512 

A 

087  503  512:061 

A 

061  087  503  512:908 

A 

061  087  503  512  908 : 170 

A 

061  087  170  503  512  908:897 


061  087  154  170  275  426  503  509  512  612  653  677  765  897  908:703 

A 

061  087  154  170  275  426  503  509  512  612  653  677  703  765  897  908 


Program  S ( Straight  insertion  sort).  The  records  to  be  sorted  are  in  locations 
INPUT+1  through  INPUT+N;  they  are  sorted  in  place  in  the  same  area,  on  a full- 


word  key.  rll  = j 

< - N;  rI2  = 

i;  rA  = R 

= K- 

assume  that  N > 2. 

01 

START 

ENT1 

2-N 

1 

SI.  Loop  on  i f «—  2. 

02 

2H 

LDA 

INPUT+N ,1 

N- 

1 

S2.  Set  up  i.  K.  R. 

03 

ENT2 

N-1,1 

N- 

1 

i «-  j - 1. 

04 

3H 

CMPA 

INPUT, 2 

B + N- 

1- A 

S3.  Compare  K : Ki. 

05 

JGE 

5F 

B + N- 

1 -A 

To  S5  if  K > Ki. 

06 

4H 

LDX 

INPUT, 2 

B 

S4.  Move  Ri.  decrease  i. 

07 

STX 

INPUT+1, 2 

B 

Ri+ 1 Ri- 

08 

DEC2 

1 

B 

i «—  i — 1. 

09 

J2P 

3B 

B 

To  S3  if  i > 0. 

10 

5H 

STA 

INPUT+1, 2 

N — 

1 

S5.  R into  Ri+i . 

11 

INC1 

1 

N — 

1 

12 

J1NP 

2B 

N- 

1 

2 <j<N.  | 
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The  running  time  of  this  program  is  9 B + 107V  — 3^4  — 9 units,  where  TV  is 
the  number  of  records  sorted,  A is  the  number  of  times  i decreases  to  zero  in 
step  S4,  and  B is  the  number  of  moves.  Clearly  A is  the  number  of  times 
Kj  < min(jFfi, . . . , Kj- 1)  for  1 < j < TV;  this  is  one  less  than  the  number  of  left- 
to-right  minima,  so  A is  equivalent  to  the  quantity  that  was  analyzed  carefully 
in  Section  1.2.10.  Some  reflection  shows  us  that  B is  also  a familiar  quantity: 
The  number  of  moves  for  fixed  j is  the  number  of  inversions  of  Kj,  so  B is 
the  total  number  of  inversions  of  the  permutation  K\  K2  . . . K^.  Hence  by  Eqs. 
1.2.10-(i6),  5.1.1-(i2),  and  5.1.1-(i3),  we  have 

A = (minO,  ave  H jv  — 1,  max  N — 1,  dev  \J H jv  — H ) ; 

B — (minO,  ave  ( N 2 — N)/ 4,  max  ( N 2 — N)/ 2,  dev  y/N(N  - 1)(7V  + 2.5)/6); 

and  the  average  running  time  of  Program  S,  assuming  that  the  input  keys  are 
distinct  and  randomly  ordered,  is  (2.257V2  + 7.757V  — 3 — 6)w.  Exercise  33 
explains  how  to  improve  this  slightly. 

The  example  data  in  Table  1 involves  16  items;  there  are  two  changes  to  the 
left-to-right  minimum,  namely  087  and  061;  and  there  are  41  inversions,  as  we 
have  seen  in  the  previous  section.  Hence  TV  = 16,  A = 2,  B — 41,  and  the  total 
sorting  time  is  514rt. 

Binary  insertion  and  two-way  insertion.  While  the  jth  record  is  being 
processed  during  a straight  insertion  sort,  we  compare  its  key  with  about  j/2 
of  the  previously  sorted  keys,  on  the  average;  therefore  the  total  number  of 
comparisons  performed  comes  to  roughly  (1  + 2 + ■ • • + 7V)/2  ss  7V2/4,  and  this 
gets  very  large  when  TV  is  only  moderately  large.  In  Section  6.2.1  we  shall 
study  “binary  search”  techniques,  which  show  where  to  insert  the  jth  item 
after  only  about  lgj  well-chosen  comparisons  have  been  made.  For  example, 
when  inserting  the  64th  record  we  can  start  by  comparing  iV64  with  K3 2;  if  it 
is  less,  we  compare  it  with  K i6,  but  if  it  is  greater  we  compare  it  with  K48, 
etc.,  so  that  the  proper  place  to  insert  Rq4  will  be  known  after  making  only  six 
comparisons.  The  total  number  of  comparisons  for  inserting  all  TV  items  comes 
to  about  TVlgTV,  a substantial  improvement  over  jTV2;  and  Section  6.2.1  shows 
that  the  corresponding  program  need  not  be  much  more  complicated  than  a 
program  for  straight  insertion.  This  method  is  called  binary  insertion ; it  was 
mentioned  by  John  Mauchly  as  early  as  1946,  in  the  first  published  discussion 
of  computer  sorting. 

The  unfortunate  difficulty  with  binary  insertion  is  that  it  solves  only  half 
of  the  problem;  after  we  have  found  where  record  Rj  is  to  be  inserted,  we  still 
need  to  move  about  ~j  of  the  previously  sorted  records  in  order  to  make  room 
for  Rj,  so  the  total  running  time  is  still  essentially  proportional  to  TV2.  Some 
early  computers  such  as  the  IBM  705  had  a built-in  “tumble”  instruction  that  did 
such  move  operations  at  high  speed,  and  modern  machines  can  do  the  moves  even 
faster  with  special  hardware  attachments;  but  as  TV  increases,  the  dependence 
on  TV2  eventually  takes  over.  For  example,  an  analysis  by  H.  Nagler  [CACM  3 
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(1960),  618-620]  indicated  that  binary  insertion  could  not  be  recommended  for 
sorting  more  than  about  N = 128  records  on  the  IBM  705,  when  each  record 
was  80  characters  long,  and  similar  analyses  apply  to  other  machines. 

Of  course,  a clever  programmer  can  think  of  various  ways  to  reduce  the 
amount  of  moving  that  is  necessary;  the  first  such  trick,  proposed  early  in  the 
1950s,  is  illustrated  in  Table  2.  Here  the  first  item  is  placed  in  the  center  of  an 
output  area,  and  space  is  made  for  subsequent  items  by  moving  to  the  right  or 
to  the  left,  whichever  is  most  convenient.  This  saves  about  half  the  running  time 
of  ordinary  binary  insertion,  at  the  expense  of  a somewhat  more  complicated 
program.  It  is  possible  to  use  this  method  without  using  up  more  space  than 
required  for  N records  (see  exercise  6);  but  we  shall  not  dwell  any  longer  on  this 
“two-way”  method  of  insertion,  since  considerably  more  interesting  techniques 
have  been  developed. 


Table  2 


TWO-WAY  INSERTION 


A 

503 

087 

503 

087 

A 

503 

512 

061 

087 

503 

512 

061 

087 

503 

512 

908 

061 

087 

170 

503 

512 

908 

061 

087 

170  A 

503 

512 

897 

908 

061  087 

170 

275 

503 

512 

897 

908 

Shell’s  method.  If  we  have  a sorting  algorithm  that  moves  items  only  one 
position  at  a time,  its  average  time  will  be,  at  best,  proportional  to  N2,  since 
each  record  must  travel  an  average  of  about  |lV  positions  during  the  sorting 
process  (see  exercise  7).  Therefore,  if  we  want  to  make  substantial  improvements 
over  straight  insertion,  we  need  some  mechanism  by  which  the  records  can  take 
long  leaps  instead  of  short  steps. 

Such  a method  was  proposed  in  1959  by  Donald  L.  Shell  [CACM  2,7 
(July  1959),  30-32],  and  it  became  known  as  shellsort.  Table  3 illustrates  the 
general  idea  behind  the  method:  First  we  divide  the  16  records  into  8 groups 
of  two  each,  namely  (R±,  Rq),  (R2,  Rio),  ■ ■ ■ , (Rs,  Rm)-  Sorting  each  group  of 
records  separately  takes  us  to  the  second  line  of  Table  3;  this  is  called  the  “first 
pass.”  Notice  that  154  has  changed  places  with  512;  908  and  897  have  both 
jumped  to  the  right.  Now  we  divide  the  records  into  4 groups  of  four  each, 
namely  (i?i,  R5,  Rg,  R13), . . . , (R4,  R8,  R12,  R is),  and  again  each  group  is  sorted 
separately;  this  “second  pass”  takes  us  to  line  3.  A third  pass  sorts  two  groups 
of  eight  records,  then  a fourth  pass  completes  the  job  by  sorting  all  16  records. 
Each  of  the  intermediate  sorting  processes  involves  either  a comparatively  short 
file  or  a file  that  is  comparatively  well  ordered,  so  straight  insertion  can  be  used 
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Table  3 

SHELLSORT  WITH  INCREMENTS  8,  4,  2,  1 


503  087  512  061  908  170  897  275  653  426  154  509  612  677  765  703 

8-sort: 

503  087  154  061  612  170  765  275  653  426  512  509  908  677  897  703 

4-~.: 

503  087  154  061  612  170  512  275  653  426  765  509  908  677  897  703 


2-sort: 


1-sort: 


154  061  503  087  512  170  612  275  653  426  765  509  897  677  908  703 
061  087  154  170  275  426  503  509  512  612  653  677  703  765  897  908 


for  each  sorting  operation.  In  this  way  the  records  tend  to  converge  quickly  to 
their  final  destinations. 

Shellsort  is  also  known  as  the  “diminishing  increment  sort,”  since  each  pass 
is  defined  by  an  increment  h such  that  we  sort  the  records  that  are  h units  apart. 
The  sequence  of  increments  8,  4,  2,  1 is  not  sacred;  indeed,  any  sequence  ht- 1, 
ht-2,  ■■  ■ ,h0  can  be  used,  so  long  as  the  last  increment  h0  equals  1.  For  example, 
Table  4 shows  the  same  data  sorted  with  increments  7,  5,  3,  1.  Some  sequences 
are  much  better  than  others;  we  will  discuss  the  choice  of  increments  later. 

Algorithm  D ( Shellsort ).  Records  R\ , . . . , R^  are  rearranged  in  place;  after 
sorting  is  complete,  their  keys  will  be  in  order,  K\  < < KN . An  auxiliary 

sequence  of  increments  ht-i,  ht- 2,  . . . , ho  is  used  to  control  the  sorting  process, 
where  h0  — 1;  proper  choice  of  these  increments  can  significantly  decrease  the 
sorting  time.  This  algorithm  reduces  to  Algorithm  S when  t = 1. 

Dl.  [Loop  on  s.j  Perform  step  D2  for  s = t - 1,  t - 2,  . . . , 0;  then  terminate  the 
algorithm. 

D2.  [Loop  on  j .]  Set  h 4—  hs,  and  perform  steps  D3  through  D6  for  h < j < N. 
(We  will  use  a straight  insertion  method  to  sort  elements  that  are  h positions 
apart,  so  that  Kt  < Ki+h  for  1 < i < N - h.  Steps  D3  through  D6  are 
essentially  the  same  as  steps  S2  through  S5,  respectively,  in  Algorithm  S.) 
D3.  [Set  up  i,  K,  f?.]  Set  i <-  j — h,  K Kj,  R Rj. 

D4.  [Compare  K : Ki]  If  K > Kt,  go  to  step  D6. 

D5.  [Move  Ri,  decrease  i.}  Set  Ri+h  <-  Rt,  then  i <-  * - h.  If  i > 0,  go  back  to 
step  D4. 

D6.  [ R into  Ri+h\  Set  Ri+h  *—  R.  | 

The  corresponding  MIX  program  is  not  much  longer  than  our  program  for 
straight  insertion.  Lines  08-19  of  the  following  code  are  a direct  translation  of 
Program  S into  the  more  general  framework  of  Algorithm  D. 

Program  D ( Shellsort ).  We  assume  that  the  increments  are  stored  in  an 
auxiliary  table,  with  hs  in  location  H + s;  all  increments  are  less  than  N.  Register 
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Table  4 

SHELLSORT  WITH  INCREMENTS  7,  5,  3,  1 


503  087  512  061  908  170  897  275  653  426  154  509  612  677  765  703 

7-sort: 

275  087  426  061  509  170  677  503  653  512  154  908  612  897  765  703 

5-s°rt: 

154  087  426  061  509  170  677  503  653  512  275  908  612  897  765  703 


3-sort: 


1-sort: 


061  087  170  154  275  426  512  503  653  612  509  765  677  897  908  703 
061  087  154  170  275  426  503  509  512  612  653  677  703  765  897  908 


assignments:  rll  ee  j - N;  rI2  = i;  rA  = R = K;  rI3  = s;  rI4  = h.  Note  that  this 
program  modifies  itself,  in  order  to  obtain  efficient  execution  of  the  inner  loop. 


01 

START 

ENT3 

T-l 

1 

Dl.  Loop  on  s.  s i — t — 1. 

02 

1H 

LD4 

H,3 

T 

D2.  Loop  on  7.  h <—  h«. 

03 

ENT1 

INPUT, 4 

T 

Modify  the  addresses  of  three 

04 

ST1 

5F  (0 : 2) 

T 

instructions  in  the  main  loop. 

05 

ST1 

6F(0:2) 

T 

06 

ENN1 

-N,4 

T 

rll  <-  N - h. 

07 

ST1 

3F (0 : 2) 

T 

08 

ENT1 

1-N.4 

T 

j <-  h + 1. 

09 

2H 

LDA 

INPUT+N, 1 

NT  -S 

D3.  Set  up  i,  K.  R. 

10 

3H 

ENT2 

N-H.l 

NT -S 

i «-  j - h.  [Instruction  modified] 

11 

4H 

CMPA 

INPUT, 2 

B+NT-S-A 

D4.  Compare  K : K,. 

12 

JGE 

6F 

B+NT-S-A 

To  D6  if  K > Ki. 

13 

LDX 

INPUT, 2 

B 

D5.  Move  Ft, , decrease  i. 

H 

5H 

STX 

INPUT+H.2 

B 

Ri+h  <—  Ri • [Instruction  modified] 

15 

DEC2 

0,4 

B 

i <—  i — h. 

16 

J2P 

4B 

B 

To  D4  if  i > 0. 

17 

6H 

STA 

INPUT+H, 2 

NT  - S 

D6.  R into  Ri  j.h . [Instruction  modified] 

18 

7H 

INC1 

1 

NT-  S 

J+-3  + 1- 

19 

J1NP 

2B 

NT -S 

To  D3  if  j < N. 

20 

DEC3 

1 

T 

21 

J3NN 

IB 

T 

t > s > 0.  | 

* Analysis  of  shellsort.  In  order  to  choose  a good  sequence  of  increments 
ht~i, . . . ,ho  for  use  in  Algorithm  D,  we  need  to  analyze  the  running  time  as 
a function  of  those  increments.  This  leads  to  some  fascinating  mathematical 
problems,  not  yet  completely  resolved;  nobody  has  been  able  to  determine 
the  best  possible  sequence  of  increments  for  large  values  of  N.  Yet  a good 
many  interesting  facts  are  known  about  the  behavior  of  shellsort,  and  we  will 
summarize  them  here;  details  appear  in  the  exercises  below.  [Readers  who  are 
not  mathematically  inclined  should  skim  over  the  next  few  pages,  continuing 
with  the  discussion  of  list  insertion  following  (12).] 
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The  frequency  counts  shown  with  Program  D indicate  that  five  factors 
determine  the  execution  time:  the  size  of  the  file,  IV;  the  number  of  passes 
(that  is,  the  number  of  increments),  T = t;  the  sum  of  the  increments, 

S = ho  + • • • + hi- 1; 

the  number  of  comparisons,  B + NT  — S - A;  and  the  number  of  moves,  B.  As 
in  the  analysis  of  Program  S,  A is  essentially  the  number  of  left-to-right  minima 
encountered  in  the  intermediate  sorting  operations,  and  B is  the  number  of 
inversions  in  the  subfiles.  The  factor  that  governs  the  running  time  is  B,  so  we 
shall  devote  most  of  our  attention  to  it.  For  purposes  of  analysis  we  shall  assume 
that  the  keys  are  distinct  and  initially  in  random  order. 

Let  us  call  the  operation  of  step  D2  “h-sorting,”  so  that  shellsort  consists 
of  /q-x-sorting,  followed  by  ht_2  sorting,  . . . , followed  by  /i0-sorting.  A file  in 
which  Ki  < Kl+h  for  1 < i < N — h will  be  called  “h-ordered.” 

Consider  first  the  simplest  generalization  of  straight  insertion,  when  there 
are  just  two  increments,  hj  = 2 and  h0  = 1.  In  this  case  the  second  pass  begins 
with  a 2-ordered  sequence  of  keys,  Kx  K2  . ■ . KN.  It  is  easy  to  see  that  the  number 
of  permutations  ai  a2  . . . an  of  {1,2,...,  n}  having  at  < ai+2  for  1 < * < n - 2 is 

(l»/2j)’ 

since  we  obtain  exactly  one  2-ordered  permutation  for  each  choice  of  Ln/2j 
elements  to  put  in  the  even-numbered  positions  a2  a4  . . . , while  the  remaining 
[n/2]  elements  occupy  the  odd-numbered  positions.  Each  2-ordered  permutation 
is  equally  likely  after  a random  file  has  been  2-sorted.  What  is  the  average 
number  of  inversions  among  all  such  permutations? 

Let  An  be  the  total  number  of  inversions  among  all  2-ordered  permutations 
of  (1,2,  ...,n}.  Clearly  Ai  — 0,  A2  = 1,  A3  = 2;  and  by  considering  the  six 
cases 

1324  1234  1243  2134  2143  3142 

we  find  that  A4  = l + 0+  l-t-l-|-2-|-3  = 8.  One  way  to  investigate  An  in 
general  is  to  consider  the  “lattice  diagram”  illustrated  in  Fig.  11  for  n = 15. 
A 2-ordered  permutation  of  {1,2,...,  n}  can  be  represented  as  a path  from  the 
upper  left  corner  point  (0,0)  to  the  lower  right  corner  point  ((n/2],  L»/2J),  if 
we  make  the  kth  step  of  the  path  go  downwards  or  to  the  right,  respectively, 
according  as  k appears  in  an  odd  or  an  even  position  in  the  permutation.  This 
rule  defines  a one-to-one  correspondence  between  2-ordered  permutations  and 
n-step  paths  from  corner  to  corner  of  the  lattice  diagram;  for  example,  the  path 
shown  by  the  heavy  line  in  Fig.  11  corresponds  to  the  permutation 

2 1 3 4 6 5 7 10  8 11  9 12  14  13  15.  (1) 

Furthermore,  we  can  attach  “weights”  to  the  vertical  lines  of  the  path,  as  Fig.  11 
shows;  a line  from  (i,  j)  to  (i+1,  j)  gets  weight  \i  — j\.  A little  study  will  convince 
the  reader  that  the  sum  of  these  weights  along  each  path  is  equal  to  the  number 
of  inversions  of  the  corresponding  permutation;  this  sum  also  equals  the  number 
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Fig.  11.  Correspondence  between  2-ordering  and  paths  in  a lattice.  Italicized  numbers 
are  weights  that  yield  the  number  of  inversions  in  the  2-ordered  permutation. 


of  shaded  squares  between  the  given  path  and  the  staircase  path  indicated  by 
heavy  dots  in  the  figure.  (See  exercise  12.)  Thus,  for  example,  (i)  has  1 + 0 + 
l + 0+  l + 2 + l-(-0  = 6 inversions. 

When  a < a'  and  b < b',  the  number  of  relevant  paths  from  (a.  b)  to  (o',  b') 
is  the  number  of  ways  to  mix  a'  - a vertical  lines  with  b'  - b horizontal  lines, 
namely 

( a!  — a + b'  — b\ 

\ a'  — a ) ’ 

hence  the  number  of  permutations  whose  corresponding  path  traverses  the  ver- 
tical line  segment  from  (i,j)  to  (*+l,  j)  is 

/i  + A /n-i-j-  1\ 

V i H2\-j  J' 

Multiplying  by  the  associated  weight  and  summing  over  all  segments  gives 


The  absolute  value  signs  in  these  sums  make  the  calculations  somewhat  tricky, 
but  exercise  14  shows  that  An  has  the  surprisingly  simple  form  [n/2j  2n~2.  Hence 


i 


u ’ I w 
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the  average  number  of  inversions  in  a random  2-ordered  permutation  is 


[n/2\ 


by  Stirling’s  approximation  this  is  asymptotically  yV/128  n3/2  « 0.15 n3/2.  The 
maximum  number  of  inversions  is  easily  seen  to  be 

A"/2J  + 1\  1 2 

V 2 ) ~ 8”  ' 

It  is  instructive  to  study  the  distribution  of  inversions  more  carefully,  by 
examining  the  generating  functions 


hi{z)  = 1, 

*2(z)  = 1+  2, 

M-2)  = l + 2z,  ^ 

hi(z)  = 1 + 3 z -\-  z2  -\-  z3 , . . . , 

as  in  exercise  15.  In  this  way  we  find  that  the  standard  deviation  is  also 
proportional  to  n3/2,  so  the  distribution  is  not  extremely  stable  about  the  mean. 

Now  let  us  consider  the  general  two-pass  case  of  Algorithm  D,  when  the 
increments  are  h and  1: 


Theorem  H.  The  average  number  of  inversions  in  an  h-ordered  permutation 
of  {1, 2, ... , n}  is 


f(n,  h)  = 


22q~1q!  q\ 

(29+1)! 


(4) 


where  q = |_n/hj  and  r — n mod  h. 

This  theorem  is  due  to  Douglas  H.  Hunt  [Bachelor’s  thesis,  Princeton  University 
(April  1967)].  Note  that  when  h > n the  formula  correctly  gives  f(n,h)  = 1 (”) . 


Proof.  An  h-ordered  permutation  contains  r sorted  subsequences  of  length  q+ 1, 
and  h - r of  length  q.  Each  inversion  comes  from  a pair  of  distinct  subsequences, 
and  a given  pair  of  distinct  subsequences  in  a random  h-ordered  permutation 
defines  a random  2-ordered  permutation.  The  average  number  of  inversions 
is  therefore  the  sum  of  the  average  number  of  inversions  between  each  pair  of 
distinct  subsequences,  namely 


A-2q+2 
f2q  + 2\ 

W+i ) 


+ r{h  — r) 


f(n,h).  | 


Corollary  H.  If  the  sequence  of  increments  ht_j,  ...,  hx,  h0  satisfies  the 
condition 

hs+ 1 mod  hs  = 0,  for  t - 1 > s > 0,  (5) 
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Fig.  12.  The  average  number,  f(n , h),  of  inversions  in  an  /i-ordered  file  of  n elements, 
shown  for  n = 64. 

then  the  average  number  of  move  operations  in  Algorithm  D is 

X (rs/(<7a  + l,  hs+1/hs)  + (ha  - rs)f(qs,  hs+1/hs)),  (6) 

t>8>  0 

where  rs  = N mod  hs,  qs  = [N/hs\,  ht  = TV/if_i,  and  f is  defined  in  (4). 

Proof.  The  process  of  /t.,-sorting  consists  of  a straight  insertion  sort  on  rs 
( l>s+  i/hs )-°rdered  subfiles  of  length  qs  + 1,  and  on  (ha  — rs ) such  subfiles  of 
length  qs.  The  divisibility  condition  implies  that  each  of  these  subfiles  is  a ran- 
dom (/is+i//i3)-ordered  permutation,  in  the  sense  that  each  (/is+1//is)-ordered 
permutation  is  equally  likely,  since  we  are  assuming  that  the  original  input  was 
a random  permutation  of  distinct  elements.  | 

Condition  (5)  in  this  corollary  is  always  satisfied  for  two-pass  shellsorts, 
when  the  increments  are  h and  1.  If  q = [N/h\  and  r = TV  mod  h,  the  quantity 
B in  Program  D will  have  an  average  value  of 

rf(q+ 1-  N)  + (h-  r)f(q,  N ) + /(TV,  fc)  = I («+ X)  + + /(TV,  h). 

To  a first  approximation,  the  function  f{n,h)  equals  (0F/8 )n3/2h1/2;  we  can, 
for  example,  compare  it  to  the  smooth  curve  in  Fig.  12  when  n — 64.  Hence  the 
running  time  for  a two-pass  Program  D is  approximately  proportional  to 

2T V2/h  + VttNVi. 

The  best  choice  of  h is  therefore  approximately  ^/16N/tt  sa  1.72  VN;  and  with 
this  choice  of  h we  get  an  average  running  time  proportional  to  TV5/3. 
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Thus  we  can  make  a substantial  improvement  over  straight  insertion,  from 
0(N 2)  to  0(N1'667),  just  by  using  shellsort  with  two  increments.  Clearly  we 
can  do  even  better  when  more  increments  are  used.  Exercise  18  discusses  the 
optimum  choice  of  ht- 1, . . . , ho  when  t is  fixed  and  when  the  h’ s are  constrained 
by  the  divisibility  condition;  the  running  time  decreases  to  0(N15+e /2),  where 
e = 1/(2*  — 1),  for  large  N.  We  cannot  break  the  N15  barrier  by  using  the 
formulas  above,  since  the  last  pass  always  contributes 

f{NM)-{V^im3/2h\/2 

inversions  to  the  sum. 

But  our  intuition  tells  us  that  we  can  do  even  better  when  the  increments 
/it_ i, . . . , ho  do  not  satisfy  the  divisibility  condition  (5).  For  example,  8-sorting 
followed  by  4-sorting  followed  by  2-sorting  does  not  allow  any  interaction  between 
keys  in  even  and  odd  positions;  therefore  the  final  1-sorting  pass  is  inevitably 
faced  with  0(1V3/2)  inversions,  on  the  average.  By  contrast,  7-sorting  followed 
by  5-sorting  followed  by  3-sorting  jumbles  things  up  in  such  a way  that  the  final 
1-sorting  pass  cannot  encounter  more  than  2 N inversions!  (See  exercise  26.) 
Indeed,  an  astonishing  phenomenon  occurs: 

Theorem  K.  If  a k-ordered  file  is  h-sorted,  it  remains  k-ordered. 

Thus  a file  that  is  first  7-sorted,  then  5-sorted,  becomes  both  7-ordered  and 
5-ordered.  And  if  we  3-sort  it,  the  result  is  ordered  by  7s,  5s,  and  3s.  Examples 
of  this  remarkable  property  can  be  seen  in  Table  4 on  page  85. 

Proof.  Exercise  20  shows  that  Theorem  K is  a consequence  of  the  following  fact: 

Lemma  L.  Let  m,  n,  r be  nonnegative  integers,  and  let  (aq, . . . , xm+r)  and 
(y  1, . . . , yn+r ) be  any  sequences  of  numbers  such  that 

Vl  — ^m+ 1)  V2  fs  xm+2i  • • • t Vr  5)  %m+r-  (7) 

If  the  x’s  and  y’s  are  sorted  independently,  so  that  X\  < ■ ■ • < xm+r  and  iq  < 

• ■ • < y„+r,  the  relations  (7)  will  still  be  valid. 

Proof.  All  but  m of  the  x’s  are  known  to  dominate  (that  is,  to  be  greater  than 
or  equal  to)  some  y,  where  distinct  x’s  dominate  distinct  y’s.  Let  1 < j < r. 
Since  xm+j  after  sorting  dominates  m + j of  the  x’s,  it  dominates  at  least  j of 
the  y’s;  therefore  it  dominates  the  smallest  j of  the  y’s;  hence  xm+j  > yj  after 
sorting.  | | 

Theorem  K suggests  that  it  is  desirable  to  sort  with  relatively  prime  incre- 
ments, but  it  does  not  lead  directly  to  exact  estimates  of  the  number  of  moves 
made  in  Algorithm  D.  Moreover,  the  number  of  permutations  of  {l,2,...,n} 
that  are  both  h- ordered  and  /(-ordered  is  not  always  a divisor  of  n!,  so  we  can  see 
that  Theorem  K does  not  tell  the  whole  story;  some  k-  and  /i-ordered  files  are 
obtained  more  often  than  others  after  k-  and  /i-sorting.  Therefore  the  average- 
case  analysis  of  Algorithm  D for  general  increments  ht- 1,  ...,  ho  has  baffled 
everyone  so  far  when  t > 3.  There  is  not  even  an  obvious  way  to  find  the  worst 
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case,  when  N and  (ht-i,  ■ ■ ■ , fto)  are  given.  We  can,  however,  derive  several 
facts  about  the  approximate  maximum  running  time  when  the  increments  have 
certain  forms: 

Theorem  P.  The  running  time  of  Algorithm  D is  0(N3/2),  when  hs  = 2S+1  — 1 
for  0 < s < t — [lg  1VJ . 

Proof.  It  suffices  to  bound  Bs,  the  number  of  moves  in  pass  s,  in  such  a way 
that  Bt_ i + • • • + Bo  — 0(N3/2).  During  the  first  t/2  passes,  for  t > s > t/2, 
we  may  use  the  obvious  bound  Bs  — 0(/is(Ar//is)2);  and  for  subsequent  passes 
we  may  use  the  result  of  exercise  23,  Bs  = 0(Nhs+2hs+i/hs).  Consequently 
£f_  i + • • • + Bo  = 0(N{2  + 22  + • • • + 2*/2  + 24/2  + • • • + 2))  = 0(1V3/2).  | 

This  theorem  is  due  to  A.  A.  Papernov  and  G.  V.  Stasevich,  Problemy 
Peredachi  Informatsii  1,3  (1965),  81-98.  It  gives  an  upper  bound  on  the  worst- 
case  running  time  of  the  algorithm,  not  merely  a bound  on  the  average  running 
time.  The  result  is  not  trivial  since  the  maximum  running  time  when  the  ft’s 
satisfy  the  divisibility  constraint  (5)  is  of  order  N 2;  and  exercise  24  shows  that 
the  exponent  3/2  cannot  be  lowered. 

An  interesting  improvement  of  Theorem  P was  discovered  by  Vaughan  Pratt 
in  1969:  If  the  increments  are  chosen  to  be  the  set  of  all  numbers  of  the  form  2p3q 
that  are  less  than  N,  the  running  time  of  Algorithm  D is  of  order  N (log  N)2 . In 
this  case  we  can  also  make  several  important  simplifications  to  the  algorithm;  see 
exercises  30  and  31.  However,  even  with  these  simplifications,  Pratt’s  method 
requires  a substantial  overhead  because  it  makes  quite  a few  passes  over  the  data. 
Therefore  his  increments  don’t  actually  sort  faster  than  those  of  Theorem  P in 
practice,  unless  N is  astronomically  large.  The  best  sequences  for  real-world  N 
appear  to  satisfy  hs  « ps , where  the  ratio  p ss  hs+i/hs  is  roughly  independent 
of  s but  may  depend  on  N. 

We  have  observed  that  it  is  unwise  to  choose  increments  in  such  a way  that 
each  is  a divisor  of  all  its  predecessors;  but  we  should  not  conclude  that  the  best 
increments  are  relatively  prime  to  all  of  their  predecessors.  Indeed,  every  element 
of  a file  that  is  grft-sorted  and  gfc-sorted  with  ft  _L  k has  at  most  |(ft  — 1 )(k  — 1) 
inversions  when  we  are  gr-sorting.  (See  exercise  21.)  Pratt’s  sequence  {2P39} 
wins  as  N — ► 00  by  exploiting  this  fact,  but  it  grows  too  slowly  for  practical  use. 

Janet  Incerpi  and  Robert  Sedgewick  [J.  Comp.  Syst.  Sci.  31  (1985),  210-224; 
see  also  Lecture  Notes  in  Comp.  Sci.  1136  (1996),  1-11]  have  found  a way  to  have 
the  best  of  both  worlds,  by  showing  how  to  construct  a sequence  of  increments 
for  which  hs  sa  ps  yet  each  increment  is  the  gcd  of  two  of  its  predecessors.  Given 
any  number  p > 1,  they  start  by  defining  a base  sequence  a±,  a 2,  . . . , where  is 
the  least  integer  > pk  such  that  a3  J_  a*,  for  1 < j < k.  If  p — 2.5,  for  example, 
the  base  sequence  is 

au  a2,  a3,  ...  — 3,  7,  16,  41,  101,  247,  613,  1529,  3821,  9539,  .... 

Now  they  define  the  increments  by  setting  fto  = 1 and 

for  (])<<<('/)■ 


h' g h' g “p  CLy* 


(8) 
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Thus  the  sequence  of  increments  starts 

1;  ai;  02,  aia2;  aia3,  a2a3,  aia2a3;  .... 

For  example,  when  p = 2.5  we  get 

1,  3,  7,  21,  48,  112,  336,  861,  1968,  4592,  13776,  33936,  86961,  198768,  .... 
The  crucial  point  is  that  we  can  turn  recurrence  (8)  around: 

hs  = hr+s/ar  = h^/a^_s  for  ^ ^ ) < « < • (9) 

Therefore,  by  the  argument  in  the  previous  paragraph,  the  number  of  inversions 
per  element  when  we  are  h0-sorting,  hi-sorting,  ...  is  at  most 

Ha 2,  «i);  Ka 3,  a2),  b(a3,  Oi);  b(a4,  a3 ),  6(04,  a2 ),  b(a4,  ai); . . . (10) 

where  b(h,  k ) = |(h  — l)(/c  — 1).  If  pt_1  < N < p*,  the  total  number  B of  moves 
is  at  most  N times  the  sum  of  the  first  t elements  of  this  sequence.  Therefore 
(see  exercise  41)  we  can  prove  that  the  worst-case  running  time  is  much  better 
than  order  TV15: 

Theorem  I.  The  running  time  for  Algorithm  D is  0{Nec v/*"~^)  when  the  in- 
crements hs  are  defined  by  (8).  Here  c = V&  In  p and  the  constant  implied  by  O 
depends  on  p.  | 

This  asymptotic  upper  bound  is  not  especially  important  as  N — » 00, 
because  Pratt’s  sequence  does  better.  The  main  point  of  Theorem  I is  that 
a sequence  of  increments  with  the  practical  growth  rate  hs  ss  ps  can  have  a 
running  time  that  is  guaranteed  to  be  0(Nl+f  ) for  arbitrarily  small  e > 0,  when 
any  value  p > 1 is  given. 

Let’s  consider  practical  sizes  of  N more  carefully  by  looking  at  the  total 
running  time  of  Program  D,  namely  (9S  + 101VT+13T-105-3^+l)u.  Table  5 
shows  the  average  running  time  for  various  sequences  of  increments  when  N = 8. 
For  this  small  value  of  N,  bookkeeping  operations  are  the  most  significant  part 
of  the  cost,  and  the  best  results  are  obtained  when  t = 1;  hence  for  N = 8 
we  are  better  off  using  simple  straight  insertion.  (The  average  running  time  of 
Program  S when  iV  = 8 is  only  191. 85u.)  Curiously,  the  best  two-pass  algorithm 
occurs  when  hi  = 6,  since  a large  value  of  S is  more  important  here  than  a 
small  value  of  B.  Similarly,  the  three  increments  3 2 1 minimize  the  average 
number  of  moves,  but  they  do  not  lead  to  the  best  three-pass  sequence.  It  may 
be  of  interest  to  record  here  some  “worst-case”  permutations  that  maximize  the 
number  of  moves,  since  the  general  construction  of  such  permutations  is  still 
unknown: 

h2  = 5,  hi  = 3,  ho  = 1:  8 5 2 6 3 741  (19  moves) 

h2  = 3,  hi  = 2,  ho  = 1:  8 3 5 72  461  (17  moves) 
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Table  5 

ANALYSIS  OF  ALGORITHM  D WHEN  N = 8 


Increments 

Vlave 

-^ave 

S 

T 

MIX  time 

1 

1.718 

14.000 

1 

1 

204.85m 

2 1 

2.667 

9.657 

3 

2 

235.91m 

3 1 

2.917 

9.100 

4 

2 

220.15m 

4 1 

3.083 

10.000 

5 

2 

217.75m 

5 1 

2.601 

10.000 

6 

2 

209.20m 

6 1 

2.135 

10.667 

7 

2 

206.60m 

7 1 

1.718 

12.000 

8 

2 

209.85m 

4 2 1 

3.500 

8.324 

7 

3 

274.42m 

5 3 1 

3.301 

8.167 

9 

3 

253.60m 

3 2 1 

3.320 

7.829 

6 

3 

280.50m 

As  N grows  larger  we  have  a slightly  different  picture.  Table  6 shows 
the  approximate  number  of  moves  for  various  sequences  of  increments  when 
N = 1000.  The  first  few  entries  satisfy  the  divisibility  constraints  (5),  so 
that  formula  (6)  and  exercise  19  can  be  used;  empirical  tests  were  used  to 
get  approximate  average  values  for  the  other  cases.  Ten  thousand  random  files 
of  1000  elements  were  generated,  and  they  each  were  sorted  with  each  of  the 
sequences  of  increments.  The  standard  deviation  of  the  number  of  left-to-right 
minima  A was  usually  about  15;  the  standard  deviation  of  the  number  of  moves 
B was  usually  about  300. 

Some  patterns  are  evident  in  this  data,  but  the  behavior  of  Algorithm  D still 
remains  very  obscure.  Shell  originally  suggested  using  the  increments  [N/2\, 
|AV  4j , . . . , but  this  is  undesirable  when  the  binary  representation  of  N 

contains  a long  string  of  zeros.  Lazarus  and  Frank  [CACM  3 (1960),  20-22] 
suggested  using  essentially  the  same  sequence,  but  adding  1 when  necessary, 
to  make  all  increments  odd.  Hibbard  [CACM  6 (1963),  206-213]  suggested 
using  increments  of  the  form  2fc  — 1;  Papernov  and  Stasevich  suggested  the  form 
2k  + 1.  Other  natural  sequences  investigated  in  Table  6 involve  the  numbers 
(2k  - ( — l)fc)/3  and  (3fc  - l)/2,  as  well  as  Fibonacci  numbers  and  the  Incerpi- 
Sedgewick  sequences  (8)  for  p = 2.5  and  p = 2.  Pratt-like  sequences  {5P119} 
and  {7!T3'?}  are  also  shown,  because  they  retain  the  asymptotic  (9 (A’ (log  A,r)2j 
behavior  but  have  lower  overhead  costs  for  small  N.  The  final  examples  in 
Table  6 come  from  another  sequence  devised  by  Sedgewick,  based  on  slightly 
different  heuristics  [J.  Algorithms  7 (1986),  159-173]: 

l _ / 9 • 2s  — 9 • 2s/2  + 1,  if  s is  even;  , , 

s \8-25  — 6-2<s+1>/2  + 1,  if  sis  odd.  (ll) 

When  these  increments  (h0,  hx,  h2, . . . ) - (1,5,19,41,109,209,...)  are  used, 
Sedgewick  proved  that  the  worst-case  running  time  is  0(N 4/3). 

The  minimum  number  of  moves,  about  6750,  was  observed  for  increments 
of  the  form  2k  + 1,  and  also  in  the  Incerpi-Sedgewick  sequence  for  p = 2.  But  it 
is  important  to  realize  that  the  number  of  moves  is  not  the  only  consideration, 
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Table  6 

APPROXIMATE  BEHAVIOR  OF  ALGORITHM  D WHEN  N = 1000 


Increments 

•^ave 

B&ve 

T 

1 

6 

249750 

1 

17 

1 

65 

41667 

2 

60 

6 

1 

158 

26361 

3 

140 

20 

4 

1 

262 

21913 

4 

256 

64 

16 

4 

1 

362 

20459 

5 

576 

192 

48 

16 

4 

1 

419 

20088 

6 

729 

243 

81 

27 

9 

3 

1 

378 

18533 

7 

512 

256 

128 

64 

32 

16 

8 

4 

2 

1 

493 

16435 

10 

500 

250 

125 

62 

31 

15 

7 

3 

1 

516 

7655 

9 

501 

251 

125 

63 

31 

15 

7 

3 

1 

558 

7370 

9 

511 

255 

127 

63 

31 

15 

7 

3 

1 

559 

7200 

9 

255 

127 

63 

31 

15 

7 

3 

1 

436 

7445 

8 

127 

63 

31 

15 

7 

3 

1 

299 

8170 

7 

63 

31 

15 

7 

3 

1 

190 

9860 

6 

31 

15 

7 

3 

1 

114 

13615 

5 

513 

257 

129 

65 

33 

17 

9 

5 

3 

1 

561 

6745 

10 

257 

129 

65 

33 

17 

9 

5 

3 

1 

440 

6995 

9 

129 

65 

33 

17 

9 

5 

3 

1 

304 

7700 

8 

65 

33 

17 

9 

5 

3 

1 

197 

9300 

7 

33 

17 

9 

5 

3 

1 

122 

12695 

6 

683 

341 

171 

85 

43 

21 

11 

5 

3 

1 

511 

7365 

10 

341 

171 

85 

43 

21 

11 

5 

3 

1 

490 

7490 

9 

255 

63 

15 

7 

3 

1 

373 

8620 

6 

257 

65 

17 

5 

3 

1 

375 

8990 

6 

341 

85 

21 

5 

3 

1 

410 

9345 

6 

377  233 

144 

89 

55 

34 

21 

13 

8 

5 

3 

2 

1 

518 

7400 

13 

233 

144 

89 

55 

34 

21 

13 

8 

5 

3 

2 

1 

432 

7610 

12 

377 

144 

55 

21 

8 

3 

1 

456 

8795 

7 

365 

122 

41 

14 

5 

2 

1 

440 

8085 

7 

364 

121 

40 

13 

4 

1 

437 

8900 

6 

121 

40 

13 

4 

1 

268 

9790 

5 

336 

112 

48 

21 

7 

3 

1 

432 

7840 

7 

306 

170 

90 

45 

18 

10 

5 

2 

1 

465 

6755 

9 

169 

91 

49 

13 

7 

1 

349 

8698 

6 

275 

125 

121 

55 

25 

11 

5 

1 

446 

6788 

8 

190 

84 

37 

16 

7 

3 

1 

359 

7201 

7 

929 

505 

209 

109 

41 

19 

5 

1 

512 

7725 

8 

505 

209 

109 

41 

19 

5 

1 

519 

7790 

7 

209 

109 

41 

19 

5 

1 

382 

8165 

6 

even  though  it  dominates  the  asymptotic  running  time.  Since  Program  D takes 
9B  + 10 (NT  - S)  + ■ ■ ■ units  of  time,  we  see  that  saving  one  pass  is  about  as 
desirable  as  saving  * f-N  moves;  when  N = 1000  we  are  willing  to  add  1111  moves 
if  we  can  save  one  pass.  (The  first  pass  is  very  quick,  however,  if  ht-i  is  near  N. 
because  NT  - S = (N  - ht-i)  + ■ ■ ■ + (N  - h0).) 
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Empirical  tests  conducted  by  M.  A.  Weiss  [Comp.  J.  34  (1991),  88-91] 
suggest  strongly  that  the  average  number  of  moves  performed  by  Algorithm  D 
with  increments  2fc  — 1,  . . . , 15,  7,  3,  1 is  approximately  proportional  to  A 5/4. 
More  precisely,  Weiss  found  that  Bave  ~ 1.55A5/4  — 4.48A  + 0(A3/4)  for 
100  < A < 12000000  when  these  increments  are  used;  the  empirical  standard 
deviation  was  approximately  .0657V5/4.  On  the  other  hand,  subsequent  tests  by 
Marcin  Ciura  show  that  Sedgewick’s  sequence  (n)  apparently  makes  Z?ave  = 
0(A(log  A)2)  or  better.  The  standard  deviation  for  sequence  (n)  is  amazingly 
small  for  A < 106,  but  it  mysteriously  begins  to  “explode”  when  A passes  107. 

Table  7 shows  typical  breakdowns  of  moves  per  pass  obtained  in  three 
random  experiments,  using  increments  of  the  forms  2fc  — 1,  2fc  + 1,  and  (n). 
The  same  file  of  numbers  was  used  in  each  case.  The  total  number  of  moves, 
^2  Bs , comes  to  346152,  329532,  248788  in  the  three  cases,  so  sequence  (n)  is 
clearly  superior  in  this  example. 


Table  7 

MOVES  PER  PASS:  EXPERIMENTS  WITH  N = 20000 


hs 

Bs 

hs 

Bs 

hs 

Bs 

4095 

19458 

4097 

19459 

3905 

20714 

2047 

15201 

2049 

14852 

2161 

13428 

1023 

16363 

1025 

15966 

929 

18206 

511 

18867 

513 

18434 

505 

16444 

255 

23232 

257 

22746 

209 

21405 

127 

28034 

129 

27595 

109 

19605 

63 

33606 

65 

34528 

41 

26604 

31 

40350 

33 

45497 

19 

23441 

15 

66037 

17 

48717 

5 

38941 

7 

43915 

9 

38560 

1 

50000 

3 

24191 

5 

20271 

1 

16898 

3 

9448 

1 

13459 

Although  Algorithm  D is  gradually  becoming  better  understood,  more  than 
three  decades  of  research  have  failed  to  turn  up  any  grounds  for  making  strong 
assertions  about  what  sequences  of  increments  make  it  work  best.  If  A is  less 
than  1000,  a simple  rule  such  as 

Let  h0  = 1,  hs+ 1 — 3 hs  + 1,  and  stop  with  ht-\  when  ht+1  > A (12) 

seems  to  be  about  as  good  as  any  other.  For  larger  values  of  A,  Sedgewick’s 
sequence  (11)  can  be  recommended.  Still  better  results,  possibly  even  of  order 
A log  A,  have  been  reported  by  N.  Tokuda  using  the  quantity  [2.25hs\  in  place 
of  3 hs  in  (12);  see  Information  Processing  92  1 (1992),  449-457. 

List  insertion.  Let  us  now  leave  shellsort  and  consider  other  types  of  im- 
provements over  straight  insertion.  One  of  the  most  important  general  ways  to 
improve  on  a given  algorithm  is  to  examine  its  data  structures  carefully,  since 
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a reorganization  of  data  structures  to  avoid  unnecessary  operations  often  leads 
to  substantial  savings.  Further  discussion  of  this  general  idea  appears  in  Section 
2.4,  where  a rather  complex  algorithm  is  studied;  let  us  consider  how  it  applies 
to  a very  simple  algorithm  like  straight  insertion.  What  is  the  most  appropriate 
data  structure  for  Algorithm  S? 

Straight  insertion  involves  two  basic  operations: 

i)  scanning  an  ordered  file  to  find  the  largest  key  less  than  or  equal  to  a given 
key;  and 

ii)  inserting  a new  record  into  a specified  part  of  the  ordered  file. 

The  file  is  obviously  a linear  list,  and  Algorithm  S handles  this  list  by  using 
sequential  allocation  (Section  2.2.2);  therefore  it  is  necessary  to  move  roughly 
half  of  the  records  in  order  to  accomplish  each  insertion  operation.  On  the 
other  hand,  we  know  that  linked  allocation  (Section  2.2.3)  is  ideally  suited  to 
insertion,  since  only  a few  links  need  to  be  changed;  and  the  other  operation, 
sequential  scanning,  is  about  as  easy  with  linked  allocation  as  with  sequential 
allocation.  Only  one-way  linkage  is  needed,  since  we  always  scan  the  list  in  the 
same  direction.  Therefore  we  conclude  that  the  right  data  structure  for  straight 
insertion  is  a one-way,  linked  linear  list.  It  also  becomes  convenient  to  revise 
Algorithm  S so  that  the  list  is  scanned  in  increasing  order: 

Algorithm  L ( List  insertion).  Records  R\ , . . . , are  assumed  to  contain  keys 
Ki, . . . , Kn,  together  with  link  fields  L\, . . . , Ln  capable  of  holding  the  numbers 
0 through  N,  there  is  also  an  additional  link  field  Lq,  in  an  artificial  record 
R0  at  the  beginning  of  the  file.  This  algorithm  sets  the  link  fields  so  that  the 
records  are  linked  together  in  ascending  order.  Thus,  ifp(l) . . .p(N)  is  the  stable 
permutation  that  makes  ATp(i)  < • ■ • < Kp(n)  , this  algorithm  will  yield 

-^o  — pi l)^  Lp(i)  ~ P{i  T 1),  for  1 < i < N',  Tp(./v)  = 0.  (13) 

LI.  [Loop  on  j.]  Set  L0  4-  N,  LN  4-  0.  (Link  Lq  acts  as  the  “head”  of  the  list, 
and  0 acts  as  a null  link;  hence  the  list  is  essentially  circular.)  Perform  steps 
L2  through  L5  for  j = N— 1,  N — 2,  . . . , 1;  then  terminate  the  algorithm. 

L2.  [Set  up  p,  q,  K .]  Set  p 4—  Lo,  q 4—  0,  K 4—  Kj.  (In  the  following  steps  we 
will  insert  R:t  into  its  proper  place  in  the  linked  list,  by  comparing  K with 
the  previous  keys  in  ascending  order.  The  variables  p and  q act  as  pointers 
to  the  current  place  in  the  list,  with  p = Lq  so  that  q is  one  step  behind  p.) 

L3.  [Compare  K : Kp]  If  K < Kp,  go  to  step  L5.  (We  have  found  the  desired 
position  for  record  R,  between  Rq  and  Rp  in  the  list.) 

L4.  [Bump  p,  q.]  Set  q 4-  p,  p 4—  Lq.  If  p > 0,  go  back  to  step  L3.  (If  p — 0, 
K is  the  largest  key  found  so  far;  hence  record  R belongs  at  the  end  of  the 
list,  between  Rq  and  R0.) 

L5.  [Insert  into  list.]  Set  Lq  4-  j,  Lj  4-  p.  \ 

This  algorithm  is  important  not  only  because  it  is  a simple  sorting  method, 
but  also  because  it  occurs  frequently  as  part  of  other  list-processing  algorithms. 
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Table  8 shows  the  first  few  steps  that  occur  when  our  sixteen  example  numbers 
are  sorted;  exercise  32  gives  the  final  link  setting. 


Table  8 

EXAMPLE  OF  LIST  INSERTION 

j-  0 1 2 3 4 5 6 7 8 9 10  11  12  13  14  15  16 

Kf.  - 503  087  512  061  908  170  897  275  653  426  154  509  612  677  765  703 
Lj:  16  — — — — — — — — — — — — — — — o 

Lj-.  16  — — — — — — — — — — — — — — 015 

Lj\  14  — — — — — — — — — — — — — 16  015 


Program  L ( List  insertion ).  We  assume  that  Kj  is  stored  in  INPUT+j  (0:3), 


and  Lj  is 

stored 

in  INPUT+j  (4:5). 

rll  = j;  rI2 

p;  rI3  = q;  rA(0:3)  = K. 

01 

KEY 

EQU 

0:3 

02 

LINK 

EQU 

4:5 

03 

START  ENT1 

N 

1 

LI.  Loop  on  ?.  ? 4—  TV. 

04 

ST1 

INPUT (LINK) 

1 

L0  «-  TV. 

05 

STZ 

INPUT+NCLINK) 

1 

Ln  4 — 0. 

06 

JMP 

6F 

1 

Go  to  decrease  j. 

07 

2H 

LD2 

INPUT (LINK) 

TV  — 1 

L2.  Set  up  p.  a.  K.  v 4— 

08 

ENT3 

0 

TV-  1 

q 4-  0. 

09 

LDA 

INPUT, 1 

TV-  1 

K 4-  Kj. 

10 

3H 

CMPA 

INPUT, 2 (KEY) 

B+N-l-A 

L3.  Compare  K : K„. 

11 

JLE 

5F 

B+N-l-A 

To  L5  if  K < Kp. 

12 

4H 

ENT3 

0,2 

B 

L4.  Bump  p,  q.  q 4—  p. 

13 

LD2 

INPUT, 3 (LINK) 

B 

p 4 Lg. 

U 

J2P 

3B 

B 

To  L3  if  p > 0. 

15 

5H 

ST1 

INPUT, 3 (LINK) 

TV-  1 

L5.  Insert  into  list.  L„  4- 

16 

ST2 

INPUT, 1 (LINK) 

TV-  1 

Lj  4—  p. 

17 

6H 

DEC1 

1 

TV 

18 

J1P 

2B 

TV 

TV  > j > 1.  | 

The  running  time  of  this  program  is  7 B + 14 TV  - 3d  - 6 units,  where  TV  is 
the  length  of  the  file,  A + 1 is  the  number  of  right-to-left  maxima,  and  B is  the 
number  of  inversions  in  the  original  permutation.  (See  the  analysis  of  Program  S. 
Note  that  Program  L does  not  rearrange  the  records  in  memory;  this  can  be  done 
as  in  exercise  5.2-12,  at  a cost  of  about  207V  additional  units  of  time.)  Program  S 
requires  (9 B + 107V  — 3 A — 9)u,  and  since  B is  about  |7V2,  we  can  see  that  the 
extra  memory  space  used  for  the  link  fields  has  saved  about  22  percent  of  the 
execution  time.  Another  22  percent  can  be  saved  by  careful  programming  (see 
exercise  33),  but  the  running  time  remains  proportional  to  TV2. 

To  summarize  what  we  have  done  so  far:  We  started  with  Algorithm  S, 
a simple  and  natural  sorting  algorithm  that  does  about  ~N2  comparisons  and 
\N2  moves.  We  improved  it  in  one  direction  by  considering  binary  insertion, 
which  does  about  TVlgTV  comparisons  and  |7 V2  moves.  Changing  the  data 
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Y 

061  087*503 


154  170  275  426 


512  *908 


509 


612  653 


897 


677 


765 


Y 

703 


Fig.  13.  Example  of  Wheeler’s  tree  insertion  scheme. 


structure  slightly  with  “two-way  insertion”  cuts  the  number  of  moves  down 
to  about  | TV2.  Shellsort  cuts  the  number  of  comparisons  and  moves  to  about 
TV7  6,  for  TV  in^a  practical  range;  as  TV  ->  oc  this  number  can  be  lowered  to 
order  TV(logTV)2.  Another  way  to  improve  on  Algorithm  S,  using  a linked  data 
structure,  gave  us  the  list  insertion  method,  which  does  about  | TV2  comparisons, 
0 moves,  and  2TV  changes  of  links. 

Is  it  possible  to  marry  the  best  features  of  these  methods,  reducing  the 
number  of  comparisons  to  order  TV  log  TV  as  in  binary  insertion,  yet  reducing 
the  number  of  moves  as  in  list  insertion?  The  answer  is  yes,  by  going  to  a 
tree-structured  arrangement.  This  possibility  was  first  explored  about  1957  by 
D.  J.  Wheeler,  who  suggested  using  two-way  insertion  until  it  becomes  necessary 
to  move  some  data;  then  instead  of  moving  the  data,  a pointer  to  another  area 
of  memory  is  inserted,  and  the  same  technique  is  applied  recursively  to  all  items 
that  are  to  be  inserted  into  this  new  area  of  memory.  Wheeler’s  original  method 
[see  A.  S.  Douglas,  Comp.  J.  2 (1959),  5]  was  a complicated  combination  of 
sequential  and  linked  memory,  with  nodes  of  varying  size;  for  our  16  example 
numbers  the  tree  of  Fig.  13  would  be  formed.  A similar  but  simpler  tree-insertion 
scheme,  using  binary  trees,  was  devised  by  C.  M.  Berners-Lee  about  1958  [see 
Comp.  J.  3 (1960),  174,  184],  Since  the  binary  tree  method  and  its  refinements 
are  quite  important  for  searching  as  well  as  sorting,  they  are  discussed  at  length 
in  Section  6.2.2. 


Still  another  way  to  improve  on  straight  insertion  is  to  consider  inserting 
several  things  at  a time.  For  example,  if  we  have  a file  of  1000  items,  and 
if  998  of  them  have  already  been  sorted,  Algorithm  S makes  two  more  passes 
through  the  file  (first  inserting  -R999,  then  -Riooo)-  We  can  obviously  save  time 
if  we  compare  AT999  with  A100o,  to  see  which  is  larger,  then  insert  them  both 
with  one  look  at  the  file.  A combined  operation  of  this  kind  involves  about  -TV 
comparisons  and  moves  (see  exercise  3. 4. 2-5),  instead  of  two  passes  each  with 
about  § TV  comparisons  and  moves. 


In  other  words,  it  is  generally  a good  idea  to  “batch”  operations  that  require 
long  searches,  so  that  multiple  operations  can  be  done  together.  If  we  carry  this 
idea  to  its  natural  conclusion,  we  rediscover  the  method  of  sorting  by  merging, 
which  is  so  important  it  is  discussed  in  Section  5.2.4. 
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Address  calculation  sorting.  Surely  by  now  we  have  exhausted  all  possible 
ways  to  improve  on  the  simple  method  of  straight  insertion;  but  let’s  look  again! 
Suppose  you  want  to  arrange  several  dozen  books  on  your  bookshelves,  in  order 
by  authors’  names,  when  the  books  are  given  to  you  in  random  order.  You’ll 
naturally  try  to  estimate  the  final  position  of  each  book  as  you  put  it  in  place, 
thereby  reducing  the  number  of  comparisons  and  moves  that  you’ll  have  to  make. 
And  the  whole  process  will  be  somewhat  more  efficient  if  you  start  with  a little 
more  shelf  space  than  is  absolutely  necessary.  This  method  was  first  suggested 
for  computer  sorting  by  Isaac  and  Singleton,  JACM  3 (1956),  169-174,  and  it 
was  developed  further  by  Tarter  and  Kronmal,  Proc.  ACM  National  Conference 
21  (1966),  331-337. 

Address  calculation  sorting  usually  requires  additional  storage  space  propor- 
tional to  N,  either  to  leave  enough  room  so  that  excessive  moving  is  not  required, 
or  to  maintain  auxiliary  tables  that  account  for  irregularities  in  the  distribution 
of  keys.  (See  the  “distribution  counting”  sort,  Algorithm  5. 2D,  which  is  a form 
of  address  calculation.)  We  can  probably  make  the  best  use  of  this  additional 
memory  space  if  we  devote  it  to  link  fields,  as  in  the  list  insertion  method.  In  this 
way  we  can  also  avoid  having  separate  areas  for  input  and  output;  everything 
can  be  done  in  the  same  area  of  memory. 

These  considerations  suggest  that  we  generalize  list  insertion  so  that  several 
lists  are  kept,  not  just  one.  Each  list  is  used  for  certain  ranges  of  keys.  We 
make  the  important  assumption  that  the  keys  are  pretty  evenly  distributed,  not 
“bunched  up”  irregularly:  The  set  of  all  possible  values  of  the  keys  is  partitioned 
into  M parts,  and  we  assume  a probability  of  1 /M  that  a given  key  falls  into  a 
given  part.  Then  we  provide  additional  storage  for  M list  heads,  and  each  list 
is  maintained  as  in  simple  list  insertion. 

It  is  not  necessary  to  give  the  algorithm  in  great  detail  here;  the  method 
simply  begins  with  all  list  heads  set  to  A.  As  each  new  item  enters,  we  first  decide 
which  of  the  M parts  its  key  falls  into,  then  we  insert  it  into  the  corresponding 
list  as  in  Algorithm  L. 

To  illustrate  this  approach,  suppose  that  the  16  keys  used  in  our  examples 
are  divided  into  the  M = 4 ranges  0-249,  250-499,  500-749,  750-999.  We 
obtain  the  following  configurations  as  the  keys  K\,  K2,  . . K\q  are  successively 


inserted: 

After 

After 

After 

Final 

4 items: 

8 items: 

12  items: 

state: 

List  1: 

061,087 

061,087, 170 

061,087,154,170 

061,087,154,170 

List  2: 

275 

275,426 

275, 426 

List  3: 

503,512 

503,512 

503,509,512,653 

503,  509,  512, 612, 653, 677,  703 

List  4: 

897, 908 

897, 908 

765,897,908 

(Program  M below  actually  inserts  the  keys  in  reverse  order,  A'i6,  • • • , K2,  K\, 
but  the  final  result  is  the  same.)  Because  linked  memory  is  used,  the  varying- 
length  lists  cause  no  storage  allocation  problem.  All  lists  can  be  combined  into 
a single  list  at  the  end,  if  desired  (see  exercise  35). 
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Program  M ( Multiple  list  insertion).  In  this  program  we  make  the  same 
assumptions  as  in  Program  L,  except  that  the  keys  must  be  nonnegative , thus 

0 < Kj  < (BYTESIZE)3. 

The  program  divides  this  range  into  M equal  parts  by  multiplying  each  key  by  a 
suitable  constant.  The  list  heads  are  in  locations  HEAD+1  through  HEAD+M. 


01 

KEY 

EQU 

1:3 

02 

LINK 

EQU 

4:5 

03 

START 

ENT2 

M 

1 

04 

STZ 

HEAD, 2 

M 

HEAD  [p]  +-  A. 

05 

DEC2 

1 

M 

06 

J2P 

*-2 

M 

M > p > 1. 

07 

ENT1 

N 

1 

j t—  IV. 

08 

2H 

LDA 

INPUT, 1 (KEY) 

N 

09 

MUL 

=M(1:3)= 

N 

rA  <—  [M  ■ Kj  j BYTESIZE3] 

10 

STA 

*+1(1 : 2) 

N 

11 

ENT4 

0 

N 

rI4  <—  rA. 

12 

ENT3 

HEAD+1- INPUT, 4 

N 

q +-  L0C(HEAD  [rA]  ) . 

13 

LDA 

INPUT, 1 

N 

K +-  Kj. 

14 

JMP 

4F 

N 

Jump  to  set  p. 

15 

3H 

CMPA 

INPUT, 2 (KEY) 

B + N - A 

16 

JLE 

5F 

B + N - A 

Jump  to  insert,  if  K < Kp. 

17 

ENT3 

0,2 

B 

9 P- 

18 

4H 

LD2 

INPUT, 3 (LINK) 

B + N 

p +-  LINK(g) . 

19 

J2P 

3B 

B + N 

Jump  if  not  end  of  list. 

20 

5H 

ST1 

INPUT, 3 (LINK) 

N 

LINK(g)  +-  LOC(flj). 

21 

ST2 

INPUT, 1 (LINK) 

N 

LINK(L0C(^))  <-  p. 

22 

6H 

DEC1 

1 

N 

23 

J1P 

2B 

N 

N>j>  1.  | 

This  program  is  written  for  general  M,  but  it  would  be  better  to  fix  M 
at  some  convenient  value;  for  example,  we  might  choose  M = BYTESIZE,  so 
that  the  list  heads  could  be  cleared  with  a single  MOVE  instruction  and  the 
multiplication  sequence  of  lines  08-11  could  be  replaced  by  the  single  instruc- 
tion LD4  INPUT  ,1(1:1).  The  most  notable  contrast  between  Program  L and 
Program  M is  the  fact  that  Program  M must  consider  the  case  of  an  empty  list, 
when  no  comparisons  are  to  be  made. 


How  much  time  do  we  save  by  having  M lists?  The  total  running  time  of 
Program  M is  7 B + 31N  — 3^4  + 4 M + 2 units,  where  M is  the  number  of  lists 
and  N is  the  number  of  records  sorted;  A and  B respectively  count  the  right-to- 
left  maxima  and  the  inversions  present  among  the  keys  belonging  to  each  list. 
(In  contrast  to  other  time  analyses  of  this  section,  the  rightmost  element  of  a 
nonempty  permutation  is  included  in  the  count  A.)  We  have  already  studied 
A and  B for  M = 1.  when  their  average  values  are  respectively  HN  and  A(^). 
By  our  assumption  about  the  distribution  of  keys,  the  probability  that  a given 
list  contains  precisely  n items  at  the  conclusion  of  sorting  is  the  “binomial” 
probability 

/ /V  \ / I \ 11  / I \ —n 

(14) 


"VJYYj-  L)N- 

n)  \M)  V M) 
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Therefore  the  average  values  of  A and  B in  the  general  case  are 

n 

B—MT.(Nn)  (ffH1-®)'""  (;)/*•  <*> 

n 


Using  the  identity 


(N\  fn\  _ (N\  /N- 
\ n ) V 2/  V 2 / V n - 


which  is  a special  case  of  Eq.  1.2.6-(2o),  we  can  easily  evaluate  the 


sum  in 


(16): 


B 


ave 


l 

2 M 


(i7) 


And  exercise  37  derives  the  standard  deviation  of  B.  But  the  sum  in  (15)  is 
more  difficult.  By  Theorem  1.2.7A,  we  have 


E O (M  - irnHn  = (l  - ^)~Viv  - InM)  + e, 

n 


0 


1 \n—N 

Mj  < 


M - 1 
N+  1 ’ 


hence 


M2  / 1 \w+i 

Ame  = M(HN-\nM)  + S,  0 < S < (l  - . (18) 

(This  formula  is  practically  useless  when  M & N;  exercise  40  gives  a more 
detailed  analysis  of  the  asymptotic  behavior  of  Aave  when  M = N/a.) 

By  combining  (17)  and  (18)  we  can  deduce  the  total  running  time  of  Pro- 
gram M,  for  fixed  M as  N — » 00: 


min  31AT  + M + 2, 

ave  1.75 N2/M  + 3UV  - 3 MHN  + 3M  In  M + AM  - 35  - 1.75 N/M  + 2, 
max  3.50  N2  + 24.51V  + AM  + 2.  (19) 

Notice  that  when  M is  not  too  large  we  are  speeding  up  the  average  time  by 
a factor  of  M;  M — 10  will  sort  about  ten  times  as  fast  as  M = 1.  However, 
the  maximum  time  is  much  larger  than  the  average  time;  this  reiterates  the 
assumption  we  have  made  about  a fairly  equal  distribution  of  keys,  since  the 
worst  case  occurs  when  all  records  pile  onto  the  same  list. 

If  we  set  M — N,  the  average  running  time  of  Program  M is  approximately 
34.36Ar  units;  when  M = |Ar  it  is  slightly  more,  approximately  34.52Ar;  and 
when  M = it  is  approximately  48.04A7  The  additional  cost  of  the  sup- 
plementary program  in  exercise  35,  which  links  all  M lists  together  in  a single 
list,  raises  these  times  respectively  to  44.99^,  41.951V,  and  52.74 N.  (Note  that 
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10 N of  these  MIX  time  units  are  spent  in  the  multiplication  instruction  alone!) 
We  have  achieved  a sorting  method  of  order  N,  provided  only  that  the  keys  are 
reasonably  well  spread  out  over  their  range. 

Improvements  to  multiple  list  insertion  are  discussed  in  Section  5.2.5. 

EXERCISES 

1.  [ 10]  Is  Algorithm  S a stable  sorting  algorithm? 

2.  [11]  Would  Algorithm  S still  sort  numbers  correctly  if  the  relation  “K  > K”  in 
step  S3  were  replaced  by  “ K > AY’? 

► 3.  [30]  Is  Program  S the  shortest  possible  sorting  program  that  can  be  written  for 
MIX,  or  is  there  a shorter  program  that  achieves  the  same  effect? 

► 4.  [M20]  Find  the  minimum  and  maximum  running  times  for  Program  S,  as  a 
function  of  N. 

► 5.  [M2 7]  Find  the  generating  function  <7  at  (z)  = ^2k>o  PNkZk  for  the  total  running 
time  of  Program  S,  where  pNk  is  the  probability  that  Program  S takes  exactly  k units 
of  time,  given  a random  permutation  of  {1,2*.  . .,1V}  as  input.  Also  calculate  the 
standard  deviation  of  the  running  time,  given  N. 

6.  [23]  The  two-way  insertion  method  illustrated  in  Table  2 seems  to  imply  that 
there  is  an  output  area  capable  of  holding  up  to  2 N + 1 records,  in  addition  to  the 
input  area  containing  N records.  Show  that  two-way  insertion  can  be  done  using  only 
enough  space  for  N + 1 records,  including  both  input  and  output. 

7.  [M20]  If  ai  a2  . . . a„  is  a random  permutation  of  {1, 2, . . . , n},  what  is  the  average 
value  of  |ai  — 1|  + |o2  — 2|  + • • • + |o„  — n|?  (This  is  n times  the  average  net  distance 
traveled  by  a record  during  a sorting  process.) 

8.  [10]  Is  Algorithm  D a stable  sorting  algorithm? 

9.  [20]  What  are  the  quantities  A and  B,  and  the  total  running  time  of  Program  D, 
corresponding  to  Tables  3 and  4?  Discuss  the  relative  merits  of  shellsort  versus  straight 
insertion  in  this  case. 

► 10.  [22]  If  Kj  > Kj-h  when  we  begin  step  D3,  Algorithm  D specifies  a lot  of  actions 
that  accomplish  nothing.  Show  how  to  modify  Program  D so  that  this  redundant 
computation  can  be  avoided,  and  discuss  the  merits  of  such  a modification. 

11.  [M10]  What  path  in  a lattice  like  that  of  Fig.  11  corresponds  to  the  permutation 
1 2 5 3 7 4 8 6 9 11  10  12? 

12.  [M20]  Prove  that  the  area  between  a lattice  path  and  the  staircase  path  (as  shown 
in  Fig.  11)  equals  the  number  of  inversions  in  the  corresponding  2-ordered  permutation. 

► 13.  [M16]  Explain  how  to  put  weights  on  the  horizontal  line  segments  of  a lattice, 
instead  of  the  vertical  segments,  so  that  the  sum  of  the  horizontal  weights  on  a lattice 
path  is  the  number  of  inversions  in  the  corresponding  2-ordered  permutation. 

14.  [ M28 ] (a)  Show  that,  in  the  sums  defined  by  Eq.  (2),  we  have  A2n+ 1 = 2 A2n. 
(b)  The  general  identity  of  exercise  1.2.6-26  simplifies  to 

s^f2k  + s\  k_  1 ( 1 — y/1  — 42 Y 

V ' k > ~ \Tt~~4z  V 2z  ) 

if  we  set  r = s,  t = -2.  By  considering  the  sum  J2n  A2nzn,  show  that 

A2n  = n ■ 4n  1 . 
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► 15.  [HM33]  Let  g„(z ),  gn{z),  hn(z),  and  hn(z ) be  2total  weight  of  path  summe(j  over 
all  lattice  paths  of  length  2 n from  (0,0)  to  ( n,n ),  where  the  weight  is  defined  as  in 
Fig.  11,  subject  to  certain  restrictions  on  the  vertices  on  the  paths:  For  hn(z),  there  is 
no  restriction,  but  for  gn(z)  the  path  must  avoid  all  vertices  (i,  j)  with  i > j;  hn(z)  and 
gn(z)  are  defined  similarly,  except  that  all  vertices  (i,i)  are  also  excluded,  for  0 < i < n. 
Thus 


ffo(z)  = 1,  g-i(z)  = z,  g2(z)  = z3  + z2;  gi(z)  = z,  g2(z)  = z3; 


ho(z)  = 1,  hi  (z)  = 2+1,  h2(z)  = z3  + z2  + 3z  + 1; 

h\(z)  = 2 + 1,  h2{z)  = 23  + 2. 

Find  recurrence  relations  defining  these  functions,  and  use  these  relations  to  prove  that 


hn{l)+h'n(  1) 


7 n3  + 4n2  + 4n 
30 


(The  exact  formula  for  the  variance  of  the  number  of  inversions  in  a random  2-ordered 
permutation  of  {1, 2, . . . , 2n}  is  therefore  easily  found;  it  is  asymptotically  (^  — j^)n3.) 

16.  [M24]  Find  a formula  for  the  maximum  number  of  inversions  in  an  h-ordered 
permutation  of  {1,2,  ...,n}.  What  is  the  maximum  possible  number  of  moves  in 
Algorithm  D when  the  increments  satisfy  the  divisibility  condition  (5)? 

17.  [ M21 ] Show  that,  when  N = 2f  and  hs  = 2s  for  t > s > 0,  there  is  a unique 
permutation  of  {1, 2, ... , N}  that  maximizes  the  number  of  move  operations  performed 
by  Algorithm  D.  Find  a simple  way  to  describe  this  permutation. 

18.  [HM24]  For  large  N the  sum  (6)  can  be  estimated  as 


1 N2  y/n  ( Ni/9h\L\  N3'2h\/2\ 

4 ht- 1 8 y hf-2  ho  J 


What  real  values  of  ht-i, . . . , ho  minimize  this  expression  when  N and  t are  fixed  and 

h0  = 1? 

► 19.  [ M25 ] What  is  the  average  value  of  the  quantity  A in  the  timing  analysis  of 
Program  D,  when  the  increments  satisfy  the  divisibility  condition  (5)? 

20.  [M22]  Show  that  Theorem  K follows  from  Lemma  L. 

21.  [M25]  Let  h and  k be  relatively  prime  positive  integers,  and  say  that  an  integer 
is  generable  if  it  equals  xh  + yk  for  some  nonnegative  integers  x and  y.  Show  that  n 
is  generable  if  and  only  if  hk  — h — k — n is  not  generable.  (Since  0 is  the  smallest 
generable  integer,  the  largest  nongenerable  integer  must  therefore  be  hk  — h — k.  It 
follows  that  Ki  < Kj  whenever  j — i > (h  — l)(fc  — 1),  in  any  file  that  is  both  h-ordered 
and  fc-ordered.) 

22.  [M30]  Prove  that  all  integers  > 2s (2s  — 1)  can  be  represented  in  the  form 

uo(23  — 1)  + ai(23+1  — 1)  + a2(2s+2  — !)  + •••, 


where  the  a/ s are  nonnegative  integers;  but  2s  (2s  — 1)  — 1 cannot  be  so  represented. 
Furthermore,  exactly  2S_1(2S + s — 3)  positive  integers  are  unrepresentable  in  this  form. 

Find  analogous  formulas  when  the  quantities  2k  — 1 are  replaced  by  2k  + 1 in  the 
representations . 
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► 23.  [M2 2]  Prove  that  if  hs+2  and  hs+1  are  relatively  prime,  the  number  of  moves  that 
occur  while  Algorithm  D is  using  the  increment  hs  is  0{Nhs+2hs+1/hs).  Hint:  See 
exercise  21. 

24.  [ M42 } Prove  that  Theorem  P is  best  possible,  in  the  sense  that  the  exponent  3/2 
cannot  be  lowered. 

► 25.  [M22]  How  many  permutations  of  {1,2....,  ,V } are  both  3-ordered  and  2-ordered? 
What  is  the  maximum  number  of  inversions  in  such  a permutation?  What  is  the  total 
number  of  inversions  among  all  such  permutations? 

26.  [ M35 ] Can  a file  of  N elements  have  more  than  N inversions  if  it  is  3-,  5-,  and 
7-ordered?  Estimate  the  maximum  number  of  inversions  when  N is  large. 

27.  [M41]  (Bjorn  Poonen.)  (a)  Prove  that  there  is  a constant  c such  that  if  m of  the 
increments  hs  in  Algorithm  D are  less  than  N/2,  the  running  time  is  f2(V1+c/'/”r)  in  the 
worst  case,  (b)  Consequently  the  worst-case  running  time  is  fi  (IV  (log  N / log  log  iV ) 2 ) 
for  all  sequences  of  increments. 

28.  [15]  Which  sequence  of  increments  shown  in  Table  6 is  best  from  the  standpoint 
of  Program  D,  considering  the  average  total  running  time? 

29.  [40]  For  N = 1000  and  various  values  of  t,  find  empirical  values  of  ht-i 

hi,  h0  for  which  the  average  number  of  moves,  Bave,  is  as  small  as  you  can  make  it. 

30.  [M23]  (V.  Pratt.)  If  the  set  of  increments  in  shellsort  is  {2P39  | 2p3q  < N}, 
show  that  the  number  of  passes  is  approximately  l(log2  N)(log3N).  and  the  number 
of  moves  per  pass  is  at  most  N/2.  In  fact,  if  Kj-h  > Kj  on  any  pass,  we  will  always 
have  Kj-3h,  A/- 2/1  < Kj  < Kj-h  < Kj+h,  KJ+2h ; so  we  may  simply  interchange  Kj-h 
and  Kj  and  increase  j by  2 h,  saving  two  of  the  comparisons  of  Algorithm  D.  Hint:  See 
exercise  25. 

► 31.  [25]  Write  a MIX  program  for  Pratt’s  sorting  algorithm  (exercise  30).  Express  its 
running  time  in  terms  of  quantities  A,  B,  S,  T.  N analogous  to  those  in  Program  D. 

32.  [10]  What  would  be  the  final  contents  of  Lq  L\  . . . L 1 0 if  the  list  insertion  sort  in 
Table  8 were  carried  through  to  completion? 

► 33.  [25]  Find  a way  to  improve  on  Program  L so  that  its  running  time  is  dominated 
by  5 B instead  of  7 B,  where  B is  the  number  of  inversions.  Discuss  corresponding 
improvements  to  Program  S. 

34.  [M10]  Verify  formula  (14). 

35.  [21  ] Write  a MIX  program  to  follow  Program  M,  so  that  all  lists  are  combined  into 
a single  list.  Your  program  should  set  the  LINK  fields  exactly  as  they  would  have  been 
set  by  Program  L. 

36.  [1<¥]  Assume  that  the  byte  size  of  MIX  is  100,  and  that  the  sixteen  example  keys 
in  Table  8 are  actually  503000,  087000,  512000,  . . . , 703000.  Determine  the  running 
time  of  Programs  L and  M on  this  data,  when  M = 4. 

37.  [M25]  Let  gn{z)  be  the  probability  generating  function  for  inversions  in  a random 
permutation  of  n objects,  Eq.  5.1.1-(n).  Let  gNu(z)  be  the  corresponding  generating 
function  for  the  quantity  B in  Program  M.  Show  that 


E 9nm(z) 

N>0 


Mnwn 

N\ 


E 9n^ 

n>  0 


M 


and  use  this  formula  to  derive  the  variance  of  B. 
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38.  [HM23]  (R.  M.  Karp.)  Let  F(x)  be  a distribution  function  for  a probability 
distribution,  with  F(0)  = 0 and  F(  1)  = 1.  Given  that  the  keys  K\,  K2, . . . , Ajv  are 
independently  chosen  at  random  from  this  distribution,  and  that  M = cN,  where  c 
is  constant  and  N — t 00,  prove  that  the  average  running  time  of  Program  M is  O(N) 
when  F is  sufficiently  smooth.  (A  key  K is  inserted  into  list  j when  \_MK\  = j — 1;  this 
occurs  with  probability  F(j/M)  — F((j  — 1 )/M).  Only  the  case  F( x)  = x,  0 < x < 1, 
is  treated  in  the  text.) 

39.  [HM16]  If  a program  runs  in  approximately  A/M  + B units  of  time  and  uses 
C-\-M  locations  in  memory,  what  choice  of  M gives  the  minimum  time  x space? 

► 40.  [ HM24 ] Find  the  asymptotic  value  of  the  average  number  of  right-to-left  maxima 
that  occur  in  multiple  list  insertion,  Eq.  (15),  when  M = N/a  for  fixed  a as  N — > 00. 
Carry  out  the  expansion  to  an  absolute  error  of  0(N~1),  expressing  your  answer  in 
terms  of  the  exponential  integral  function  E\(z)  = J.°°  e-t  dt/t. 

41.  [HM26]  (a)  Prove  that  the  sum  of  the  first  (^)  elements  of  (10)  is  0(p2fc).  (b)  Now 
prove  Theorem  I. 

42.  [HM43]  Analyze  the  average  behavior  of  shellsort  when  there  are  t = 3 increments 
h.  g , and  1,  assuming  that  h _!_  g.  The  first  pass,  h-sorting,  obviously  does  a total  of 
\N2/h  + O(N)  moves. 

a)  Prove  that  the  second  pass,  (/-sorting,  does  ^-(\/h  — l/\/h)N3^2/g  + 0(hN) 
moves. 

b)  Prove  that  the  third  pass,  1-sorting,  does  %/(h,g)N  + 0(g3h2)  moves,  where 


d=l  j J 


! _ d\h~^ 

9 


► 43.  [25]  Exercise  33  uses  a sentinel  to  speed  up  Algorithm  S,  by  making  the  test 
“»  > 0”  unnecessary  in  step  S4.  This  trick  does  not  apply  to  Algorithm  D.  Nevertheless, 
show  that  there  is  an  easy  way  to  avoid  testing  “i  > 0”  in  step  D5,  thereby  speeding 
up  the  inner  loop  of  shellsort. 

44.  [M25]  If  7r  = a1  . . . an  and  n1  = a\  . . . a'n  are  permutations  of  {1, . . . , n},  say  that 
7r  < n'  if  the  ith-largest  element  of  {a!, . . . , m,}  is  less  than  or  equal  to  the  ith-largest 
element  of  {ai, . . . ,a'  },  for  1 < i < j < n.  (In  other  words,  7r  < 7r'  if  straight  insertion 
sorting  of  7r  is  componentwise  less  than  or  equal  to  straight  insertion  sorting  of  n'  after 
the  first  j elements  have  been  inserted,  for  all  j.) 

a)  If  7T  is  above  n'  in  the  sense  of  exercise  5.1.1-12,  does  it  follow  that  7r  < rr'7 

b)  If  7r  < 7r',  does  it  follow  that  nR  > n'R7 

c)  If  7r  < n',  does  it  follow  that  n is  above  7r'? 


5.2.2.  Sorting  by  Exchanging 

We  come  now  to  the  second  family  of  sorting  algorithms  mentioned  near  the 
beginning  of  Section  5.2:  “exchange”  or  “transposition”  methods  that  system- 
atically interchange  pairs  of  elements  that  are  out  of  order  until  no  more  such 
pairs  exist. 

The  process  of  straight  insertion,  Algorithm  5.2. IS,  can  be  viewed  as  an 
exchange  method:  We  take  each  new  record  Rj  and  essentially  exchange  it  with 
its  neighbors  to  the  left  until  it  has  been  inserted  into  the  proper  place.  Thus 
the  classification  of  sorting  methods  into  various  families  such  as  “insertion,” 


106  SORTING 


5.2.2 


T— ■ 1 

CM  CO 

LO 

co 

oo 

03 

03 

03  w 

Vi 

C/3 

C/3 

C/3 

03 

C/3 

C/1 

a 

01  03 

d ci 

Vi 

a 

a 

3 

a 

C/3 

Oh 

X X 

Oh 

0h 

Oh 

Oh 

Oh 

X 

703  908 

O 

908  908 

908 

908 

908 

908 

908 

908 

765  ° 703 

0897  897 

897 

897 

897 

897 

897 

897 

677  0 765 

O 

O°703  .765 

765 

765 

765 

765 

765 

765 

612  0 677 

o 

0°  765  «/  703 

703 

703 

703 

703 

703 

703 

509  0 612 

° 677  677 

o 

677 

677 

677 

677 

677 

677 

154  o 509 

° 612  0653 

653 

653 

653 

653 

653 

653 

426  ° 154 

» 509  “612 

612 

612 

612 

612 

612 

612 

653  ° 426 

o 

o°  154  ° 509 

. o 

.512 

512 

512 

512 

512 

512 

275  0 653 

o 426  o°  154 

° 509 

509 

509 

509 

509 

509 

897  ° 275 

° 653/  426 

O 

/ 1.54 

503 

503 

503 

503 

503 

170  ° 897/  275  512 

° 

/ 426 

o 

o 

o 

154 

c 

, 426 

426 

426 

426 

908=°°  170 

0 512  oc/  275 

0 503  J 

426 

J 

154 

,275 

275 

275 

061  o 512 

00*°  170  0 503 

y 275 

275 

275  y 

154 

0 

,170 

170 

512  061 

0 503  «/  170 

170 

170 

170 

170 

J 

154 

154 

087  0 503 

y 061  0 087 

087 

087 

087 

087 

087 

087 

503  087 

087  y 061 

061 

061 

061 

061 

061 

061 

Fig.  14.  The  bubble  sort  in  action. 

“exchange,”  “selection,”  etc.,  is  not  always  clear-cut.  In  this  section,  we  shall 
discuss  four  types  of  sorting  methods  for  which  exchanging  is  a dominant  char- 
acteristic: exchange  selection  (the  “bubble  sort”);  merge  exchange  (Batcher’s 
parallel  sort);  partition  exchange  (Hoare’s  “quicksort”);  and  radix  exchange. 

The  bubble  sort.  Perhaps  the  most  obvious  way  to  sort  by  exchanges  is  to 
compare  K\  with  K2,  interchanging  R\  and  R2  if  the  keys  are  out  of  order; 
then  do  the  same  to  records  R2  and  f?3,  f?3  and  R,\,  etc.  During  this  sequence 
of  operations,  records  with  large  keys  tend  to  move  to  the  right,  and  in  fact 
the  record  with  the  largest  key  will  move  up  to  become  RN.  Repetitions  of  the 
process  will  get  the  appropriate  records  into  positions  Rn-i,  Rn- 2,  etc.,  so  that 
all  records  will  ultimately  be  sorted. 

Figure  14  shows  this  sorting  method  in  action  on  the  sixteen  keys  503  087 
512  . . . 703;  it  is  convenient  to  represent  the  file  of  numbers  vertically  instead  of 
horizontally,  with  RN  at  the  top  and  Ri  at  the  bottom.  The  method  is  called 
“bubble  sorting”  because  large  elements  “bubble  up”  to  their  proper  position, 
by  contrast  with  the  “sinking  sort”  (that  is,  straight  insertion)  in  which  elements 
sink  down  to  an  appropriate  level.  The  bubble  sort  is  also  known  by  more  prosaic 
names  such  as  “exchange  selection”  or  “propagation.” 

After  each  pass  through  the  file,  it  is  not  hard  to  see  that  all  records  above 
and  including  the  last  one  to  be  exchanged  must  be  in  their  final  position,  so 
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they  need  not  be  examined  on  subsequent  passes.  Horizontal  lines  in  Fig.  14 
show  the  progress  of  the  sorting  from  this  standpoint;  notice,  for  example,  that 
five  more  elements  are  known  to  be  in  final  position  as  a result  of  Pass  4.  On 
the  final  pass,  no  exchanges  are  performed  at  all.  With  these  observations  we 
are  ready  to  formulate  the  algorithm. 

Algorithm  B ( Bubble  sort).  Records  R\ . . . . , frfy  are  rearranged  in  place;  after 
sorting  is  complete  their  keys  will  be  in  order,  K\  < • • • < Kn. 

Bl.  [Initialize  BOUND.]  Set  BOUND  <—  N.  (BOUND  is  the  highest  index  for  which 
the  record  is  not  known  to  be  in  its  final  position;  thus  we  are  indicating 
that  nothing  is  known  at  this  point.) 

B2.  [Loop  on  j.]  Set  t t—  0.  Perform  step  B3  for  j = 1,2,  . . . , BOUND  — 1,  and 
then  go  to  step  B4.  (If  BOUND  = 1,  this  means  go  directly  to  B4.) 

B3.  [Compare/exchange  Rj:Rj+1.}  If  Kj  > Kj+i,  interchange  Rj  ++  R:l  + i and 
set  t j. 

B4.  [Any  exchanges?]  If  t = 0,  terminate  the  algorithm.  Otherwise  set  BOUND  t 
and  return  to  step  B2.  | 


Fig.  15.  Flow  chart  for  bubble  sorting. 


Program  B ( Bubble  sort).  As  in  previous  MIX  programs  of  this  chapter,  we 
assume  that  the  items  to  be  sorted  are  in  locations  INPUT+1  through  INPUT+N. 


rll 

= t;  rI2 

= j- 
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Analysis  of  the  bubble  sort.  It  is  quite  instructive  to  analyze  the  running 
time  of  Algorithm  B.  Three  quantities  are  involved  in  the  timing:  the  number 
of  passes,  A;  the  number  of  exchanges,  B;  and  the  number  of  comparisons,  C.  If 
the  input  keys  are  distinct  and  in  random  order,  we  may  assume  that  they  form 
a random  permutation  of  {l,2,...,n}.  The  idea  of  inversion  tables  (Section 
5.1.1)  leads  to  an  easy  way  to  describe  the  effect  of  each  pass  in  a bubble  sort. 

Theorem  I.  Let  ai  a2  . ■ . an  be  a permutation  of  { 1,2,...,  n},  and  let  bi  b2  . . . bn 
be  the  corresponding  inversion  table.  If  one  pass  of  the  bubble  sort,  Algorithm  B, 
changes  a\  ■ ■ . an  to  the  permutation  a[  a'2  . . . a'n,  the  corresponding  inversion 
table  b[  b'2  ...  b'n  is  obtained  from  Ip  b2  . . . bn  by  decreasing  each  nonzero  entry 
by  1. 

Proof.  If  ai  is  preceded  by  a larger  element,  the  largest  preceding  element  is 
exchanged  with  it,  so  6a.  decreases  by  1.  But  if  a,  is  not  preceded  by  a larger 
element,  it  is  never  exchanged  with  a larger  element,  so  ba.  remains  0.  | 

Thus  we  can  see  what  happens  during  a bubble  sort  by  studying  the  sequence 
of  inversion  tables  between  passes.  For  example,  the  successive  inversion  tables 
corresponding  to  Fig.  14  are 

p 3183450403223210 

2072340302112100 

I— y QQQ  O / \ 

1061230201001000 

P^gg  ^ 

0050120100000000 

and  so  on.  If  6j  b2  . . . bn  is  the  inversion  table  of  the  input  permutation,  we  must 
therefore  have 


A=  1 + max {bi,b2,  ■ ■ ■ ,bn), 

0) 

B — b\  + + • • ■ + bn, 

(3) 

C = Cl  + C2  + • • • + Ca, 

(4) 

where  Cj  is  the  value  of  BOUND  — 1 at  the  beginning  of  pass  j.  In  terms  of  the 
inversion  table, 

cj  = max  {bi  + i | bi  > j - 1}  - j (5) 

(see  exercise  5).  In  example  (i)  we  therefore  have  A = 9,  B = 41,  C = 15  + 14  + 
13+12  + 7 + 5 + 4 + 3 + 2 = 75.  The  total  MIX  sorting  time  for  Fig.  14  is  960u. 

The  distribution  of  B (the  total  number  of  inversions  in  a random  permu- 
tation) is  very  well-known  to  us  by  now;  so  we  are  left  with  A and  C to  be 
analyzed. 

The  probability  that  A < k is  1/n!  times  the  number  of  inversion  tables 
having  no  components  > k,  namely  kn~kk\,  when  1 < k < n.  Hence  the 
probability  that  exactly  k passes  are  required  is 

Ak  = 1 ( kn~kkl  - (k  - 1 )n~k+1{k  - 1)! ). 


(6) 
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The  mean  value  kAk  can  now  be  calculated;  summing  by  parts,  it  is 


kn~kk\  . 

= n + 1 - 2_^  — — — = n + 1 - P(n), 


(7) 


fc=0 


where  P(n)  is  the  function  whose  asymptotic  value  was  found  to  be  ^tvti/2  — | + 
0(l/y/n)  in  Eq.  1.2.11.3-(24).  Formula  (7)  was  stated  without  proof  by  E.  H. 
Friend  in  JACM  3 (1956),  150;  a proof  was  given  by  Howard  B.  Demuth  [Ph.D. 
Thesis  (Stanford  University,  October  1956),  64-68],  For  the  standard  deviation 
of  A,  see  exercise  7. 

The  total  number  of  comparisons,  C,  is  somewhat  harder  to  handle,  and  we 
will  consider  only  Cave.  For  fixed  n,  let  fj(k)  be  the  number  of  inversion  tables 
bi  . . . bn  such  that  for  1 < i < n we  have  either  6,  < j — 1 or  b,  + i — j < k;  then 

fj(k)  = ( j + k)\  ( j - l)n~3~k,  for  0 < k < n - j.  (8) 


(See  exercise  8.)  The  average  value  of  Cj  in  (5)  is  ()C  k(fj(k)  — fj(k  — l)))/n!; 
summing  by  parts  and  then  summing  on  j leads  to  the  formula 


n! 


E m = { 


1 <j<n 
0 <k<n—j 


n + 1 
2 


1 

n! 


E s!r"“s- 

0 <r<s<n 


(9) 


Here  the  asymptotic  value  is  not  easy  to  determine,  and  we  shall  return  to  it  at 
the  end  of  this  section. 

To  summarize  our  analysis  of  the  bubble  sort,  the  formulas  derived  above 
and  below  may  be  written  as  follows: 

A — (min  1,  ave  N — \J ttN/2  + 0(1),  max  TV);  (10) 

B = (min  0,  ave  j(1V2  - N),  max  |(772  - IV ));  (11) 


C=  (min  N - 1,  ave  \ (N2  - N In  iV  — (7  + In  2 - 1)1V)  +0(v/lV), 

max  |(iV2  - N)).  ( 

12) 

In  each  case  the  minimum  occurs  when  the  input  is  already  in  order,  and  the 
maximum  occurs  when  it  is  in  reverse  order;  so  the  MIX  running  time  is  8d.  + 
75  + 80+1  = (min  8IV+1,  ave  5.751V2  + 0(N log  N),  max  7.51V2  + 0.5IV  + l) . 

Refinements  of  the  bubble  sort.  It  took  a good  deal  of  work  to  analyze  the 
bubble  sort;  and  although  the  techniques  used  in  the  calculations  are  instructive, 
the  results  are  disappointing  since  they  tell  us  that  the  bubble  sort  isn’t  really 
very  good  at  all.  Compared  to  straight  insertion  (Algorithm  5. 2. IS),  bubble 
sorting  requires  a more  complicated  program  and  takes  more  than  twice  as  long! 

Some  of  the  bubble  sort’s  deficiencies  are  easy  to  spot.  For  example,  in 
Fig.  14,  the  first  comparison  in  Pass  4 is  redundant,  as  are  the  first  two  in 
Pass  5 and  the  first  three  in  Passes  6 and  7.  Notice  also  that  elements  can  never 
move  to  the  left  more  than  one  step  per  pass;  so  if  the  smallest  item  happens 
to  be  initially  at  the  far  right  we  are  forced  to  make  the  maximum  number  of 
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Fig.  16.  The  cocktail-shaker  short  [shic]. 


comparisons.  This  suggests  the  “cocktail-shaker  sort,”  in  which  alternate  passes 
go  in  opposite  directions  (see  Fig.  16).  The  average  number  of  comparisons  is 
slightly  reduced  by  this  approach.  K.  E.  Iverson  [A  Programming  Language 
(Wiley,  1962),  218-219]  made  an  interesting  observation  in  this  regard:  If  j is 
an  index  such  that  R}  and  Rj+i  are  not  exchanged  with  each  other  on  two 
consecutive  passes  in  opposite  directions,  then  Rj  and  Rj+1  must  be  in  their 
final  position,  and  they  need  not  enter  into  any  subsequent  comparisons.  For 
example,  traversing  432186975  from  left  to  right  yields  32146875  9: 
no  interchange  occurred  between  RA  and  R$.  When  we  traverse  the  latter 
permutation  from  right  to  left,  we  find  RA  still  less  than  (the  new)  R5,  so  we 
may  immediately  conclude  that  R4  and  R5  need  not  participate  in  any  further 
comparisons. 

But  none  of  these  refinements  lead  to  an  algorithm  better  than  straight 
insertion;  and  we  already  know  that  straight  insertion  isn’t  suitable  for  large  N. 
Another  idea  is  to  eliminate  most  of  the  exchanges;  since  most  elements  simply 
shift  left  one  step  during  an  exchange,  we  could  achieve  the  same  effect  by  viewing 
the  array  differently,  shifting  the  origin  of  indexing!  But  the  resulting  algorithm 
is  no  better  than  straight  selection,  Algorithm  5.2.3S,  which  we  shall  study  later. 

In  short,  the  bubble  sort  seems  to  have  nothing  to  recommend  it,  except  a 
catchy  name  and  the  fact  that  it  leads  to  some  interesting  theoretical  problems. 

Batcher  s parallel  method.  If  we  are  going  to  have  an  exchange  algorithm 
whose  running  time  is  faster  than  order  N 2,  we  need  to  select  some  nonadjacent 
pairs  of  keys  (A),  I\  , ) for  comparisons;  otherwise  we  will  need  as  many  exchanges 
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as  the  original  permutation  has  inversions,  and  the  average  number  of  inversions 
is  |(Ar2  — N).  An  ingenious  way  to  program  a sequence  of  comparisons,  looking 
for  potential  exchanges,  was  discovered  in  1964  by  K.  E.  Batcher  [see  Proc. 
AFIPS  Spring  Joint  Computer  Conference  32  (1968),  307  314].  His  method  is 
not  at  all  obvious;  in  fact,  a fairly  intricate  proof  is  needed  just  to  show  that  it 
is  valid,  since  comparatively  few  comparisons  are  made.  We  shall  discuss  two 
proofs,  one  in  this  section  and  another  in  Section  5.3.4. 


Fig.  17.  Algorithm  M. 


Batcher’s  sorting  scheme  is  similar  to  shellsort,  but  the  comparisons  are 
done  in  a novel  way  so  that  no  propagation  of  exchanges  is  necessary.  We  can, 
for  instance,  compare  Table  1 (on  the  next  page)  to  Table  5.2. 1-3;  Batcher’s 
method  achieves  the  effect  of  8-sorting,  4-sorting,  2-sorting,  and  1-sorting,  but 
the  comparisons  do  not  overlap.  Since  Batcher’s  algorithm  essentially  merges 
pairs  of  sorted  subsequences,  it  may  be  called  the  “merge  exchange  sort.” 

Algorithm  M ( Merge  exchange).  Records  R\, . . . , Rjsr  are  rearranged  in  place; 
after  sorting  is  complete  their  keys  will  be  in  order,  Ki  < ■ ■ • < Kpf.  We  assume 
that  N > 2. 

Ml.  [Initialize  p.}  Set  p <—  2t_1,  where  t = [IgA]  is  the  least  integer  such  that 
2*  > N.  (Steps  M2  through  M5  will  be  performed  for  p = 2<_1,  2t“2,  . . . , 1.) 
M2.  [Initialize  g,  r,  d.]  Set  q 2t~1,  r <—  0,  d 4—  p. 

M3.  [Loop  on  i.]  For  all  i such  that  0 < i < N — d and  i & p = r,  do  step  M4. 
Then  go  to  step  M5.  (Here  i p means  the  “bitwise  and”  of  the  binary 
representations  of  i and  p\  each  bit  of  the  result  is  zero  except  where  both 
i and  p have  1-bits  in  corresponding  positions.  Thus  13  & 21  = (1101)2  & 
(10101)2  = (00101)2  = 5.  At  this  point,  d is  an  odd  multiple  of  p,  and  p is  a 
power  of  2,  so  that  ik.p  / (i  + d)  &p;  it  follows  that  the  actions  of  step  M4 
can  be  done  for  all  relevant  i in  any  order,  even  simultaneously.) 

M4.  [Compare/exchange  Ri+i : Ri+d+i]  If  Kt+i  > Ki+d+ 1,  interchange  the 
records  Rl+X  o Ri+d+i- 

M5.  [Loop  on  q .]  If  q ^ p,  set  d 4-  q — p,  q 4—  q/2,  r p,  and  return  to  M3. 
M6.  [Loop  on  p.]  (At  this  point  the  permutation  Ki  K2  ■ ■ . K jv  is  p-ordered.) 
Set  p [p/2j.  If  p > 0,  go  back  to  M2.  | 
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Table  1 

MERGE-EXCHANGE  SORTING  (BATCHER’S  METHOD) 


p q r d 


503  087  512  061  908  170  897  275  653  426  154  509  612  677  765  703 


8 8 0 8 


503  087  154  061  612  170  765  275  653  426  512  509  908  677  897  703 


4 8 0 4 


503  087  154  061  612  170  765  275  653  426  512  509  908  677  897  703 


4 4 4 4 


503  087  154  061  612  170  512  275  653  426  765  509  908  677  897  703 


2 8 0 2 


154  061  503  087  512  170  612  275  653  426  765  509  897  677  908  703 

154  061  503  087  512  170  612  275  653  426  765  509  897  677  908  703 

2 2 2 2 

154  061  503  087  512  170  612  275  653  426  765  509  897  677  908  703 

1801  wwwwwwww 

061  154  087  503  170  512  275  612  426  653  509  765  677  897  703  908 

061  154  087  503  170  512  275  612  426  653  509  765  677  897  703  908 

12  13 

061  154  087  275  170  426  503  509  512  653  612  703  677  897  765  908 

1111  W W W W W W W 

061  087  154  170  275  426  503  509  512  612  653  677  703  765  897  908 


Table  1 illustrates  the  method  for  N = 16.  Notice  that  the  algorithm  sorts  N 
elements  essentially  by  sorting  Rlt  R3,  R5, . . . and  R2,R4,R6,. . . independently; 
then  we  perform  steps  M2  through  M5  for  p = 1,  in  order  to  merge  the  two 
sorted  sequences  together. 

In  order  to  prove  that  the  magic  sequence  of  comparison/exchanges  specified 
in  Algorithm  M actually  will  sort  all  possible  input  files  R4  R2  . . . Rn,  we  must 
show  only  that  steps  M2  through  M5  will  merge  all  2-ordered  files  f?i  R2  . . . RN 
when  p = 1.  For  this  purpose  we  can  use  the  lattice-path  method  of  Section 
5.2.1  (see  Fig.  11  on  page  87);  each  2-ordered  permutation  of  {1,2,...,  N} 
corresponds  uniquely  to  a path  from  (0,0)  to  ([7/V/2],  (1V/2J)  in  a lattice  di- 
agram. Figure  18(a)  shows  an  example  for  N = 16,  corresponding  to  the 
permutation  1 3 2 4 10  5 11  6 13  7 14  8 15  9 16  12.  When  we  perform  step  M3  with 
P — lj  q = 2t_1,  r = 0,  d — 1,  the  effect  is  to  compare  (and  possibly  exchange) 
Ri'.R'i,  A 3 : Ri-  etc.  This  operation  corresponds  to  a simple  transformation  of 
the  lattice  path,  “folding”  it  about  the  diagonal  if  necessary  so  that  it  never 
goes  above  the  diagonal.  (See  Fig.  18(b)  and  the  proof  in  exercise  10.)  The 
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next  iterations  of  step  M3  have  p = r = 1,  and  d = 2t~1  — 1,  2*~2  — 1, . . . , 1; 
their  effect  is  to  compare/exchange  R2:R2+d,  Ri'-R^+d,  etc.,  and  again  there 
is  a simple  lattice  interpretation:  The  path  is  “folded”  about  a line  \{d  + 1) 
units  below  the  diagonal.  See  Fig.  18(c)  and  (d);  eventually  we  get  to  the 
path  in  Fig.  18(e),  which  corresponds  to  a completely  sorted  permutation.  This 
completes  a “geometric  proof”  that  Batcher’s  algorithm  is  valid;  we  might  call 
it  sorting  by  folding! 


Fig.  18.  A geometric  interpretation  of  Batcher’s  method,  N = 16. 


A MIX  program  for  Algorithm  M appears  in  exercise  12.  Unfortunately  the 
amount  of  bookkeeping  needed  to  control  the  sequence  of  comparisons  is  rather 
large,  so  the  program  is  less  efficient  than  other  methods  we  have  seen.  But  it  has 
one  important  redeeming  feature:  All  comparison/exchanges  specified  by  a given 
iteration  of  step  M3  can  be  done  simultaneously , on  computers  or  networks  that 
allow  parallel  computations.  With  such  parallel  operations,  sorting  is  completed 
in  2 |"lgiV"|  (fig N~\  + 1)  steps,  and  this  is  about  as  fast  as  any  general  method 
known.  For  example,  1024  elements  can  be  sorted  in  only  55  parallel  steps  by 
Batcher’s  method.  The  nearest  competitor  is  Pratt’s  method  (see  exercise  5.2.1- 
30),  which  uses  either  40  or  73  steps,  depending  on  how  we  count;  if  we  are 
willing  to  allow  overlapping  comparisons  as  long  as  no  overlapping  exchanges 
are  necessary,  Pratt’s  method  requires  only  40  comparison/exchange  cycles  to 
sort  1024  elements.  For  further  comments,  see  Section  5.3.4. 

Quicksort.  The  sequence  of  comparisons  in  Batcher’s  method  is  predetermined; 
we  compare  the  same  pairs  of  keys  each  time,  regardless  of  what  we  may  have 
learned  about  the  file  from  previous  comparisons.  The  same  is  largely  true  of  the 
bubble  sort,  although  Algorithm  B does  make  limited  use  of  previous  knowledge 
in  order  to  reduce  its  work  at  the  right  end  of  the  file.  Let  us  now  turn  to  a 
quite  different  strategy,  which  uses  the  result  of  each  comparison  to  determine 
what  keys  are  to  be  compared  next.  Such  a strategy  is  inappropriate  for  parallel 
computations,  but  on  computers  that  work  serially  it  can  be  quite  fruitful. 

The  basic  idea  of  the  following  method  is  to  take  one  record,  say  Ri , and  to 
move  it  to  the  final  position  that  it  should  occupy  in  the  sorted  file,  say  position  s. 
While  determining  this  final  position,  we  will  also  rearrange  the  other  records  so 
that  there  will  be  none  with  greater  keys  to  the  left  of  position  s,  and  none  with 
smaller  keys  to  the  right.  Thus  the  file  will  have  been  partitioned  in  such  a way 
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that  the  original  sorting  problem  is  reduced  to  two  simpler  problems,  namely 
to  sort  i?i . . . Rs-i  and  (independently)  to  sort  Rs+i  - ■ ■ Rn ■ We  can  apply  the 
same  technique  to  each  of  these  subfiles,  until  the  job  is  done. 

There  are  several  ways  to  achieve  such  a partitioning  into  left  and  right 
subfiles;  the  following  scheme  due  to  R.  Sedgewick  seems  to  be  best,  for  reasons 
that  will  become  clearer  when  we  analyze  the  algorithm:  Keep  two  pointers, 
i and  j,  with  i = 2 and  j — N initially.  If  R,  is  eventually  supposed  to  be 
part  of  the  left-hand  subfile  after  partitioning  (we  can  tell  this  by  comparing 
Ki  with  A'j),  increase  i by  1,  and  continue  until  encountering  a record  Fi,  that 
belongs  to  the  right-hand  subfile.  Similarly,  decrease  j by  1 until  encountering 
a record  Rj  belonging  to  the  left-hand  subfile.  If  i < j,  exchange  H,  with  R:l: 
then  move  on  to  process  the  next  records  in  the  same  way,  “burning  the  candle 
at  both  ends”  until  i > j.  The  partitioning  is  finally  completed  by  exchanging 
Rj  with  R i.  For  example,  consider  what  happens  to  our  file  of  sixteen  numbers: 


Initial  file: 

[503 

087 

512 

061 

908 

170 

897 

275 

653 

426 

154 

509 

612 

677 

765 

703] 

1st  exchange: 

503 

087 

512 

061 

908 

170 

897 

275 

653 

426 

154 

509 

612 

677 

765 

703 

2nd  exchange: 

503 

087 

154 

061 

908 

170 

897 

275 

653 

426 

512 

509 

612 

677 

765 

703 

3rd  exchange: 

503 

087 

154 

061 

426 

170 

897 

275 

653 

908 

512 

509 

612 

677 

765 

703 

Pointers  cross: 

503 

087 

154 

061 

426 

170 

275 

897 

653 

908 

512 

509 

612 

677 

765 

703 

Partitioned  file: 

[275 

087 

154 

061 

426 

170] 

503 

[897 

653 

908 

512 

509 

612 

677 

765 

703] 

t t 
j i 


(In  order  to  indicate  the  positions  of  i and  j,  keys  Kt  and  Kj  are  shown  here  in 
boldface  type.) 

Table  2 shows  how  our  example  file  gets  completely  sorted  by  this  approach, 
in  11  stages.  Brackets  indicate  subfiles  that  still  need  to  be  sorted;  double 
brackets  identify  the  subfile  of  current  interest.  Inside  a computer,  the  current 
subfile  can  be  represented  by  boundary  values  (l,r),  and  the  other  subfiles  by 
a stack  of  additional  pairs  ( lk,rk ).  Whenever  a file  is  subdivided,  we  put  the 
longer  subfile  on  the  stack  and  commence  work  on  the  shorter  one,  until  we  reach 
trivially  short  files;  this  strategy  guarantees  that  the  stack  will  never  contain 
more  than  lgTV  entries  (see  exercise  20). 

The  sorting  procedure  just  described  may  be  called  partition- exchange  sort- 
ing; it  is  due  to  C.  A.  R.  Hoare,  whose  interesting  paper  [Comp.  J.  5 (1962), 
10-15]  contains  one  of  the  most  comprehensive  accounts  of  a sorting  method  that 
has  ever  been  published.  Hoare  dubbed  his  method  “quicksort,”  and  that  name 
is  not  inappropriate,  since  the  inner  loops  of  the  computation  are  extremely  fast 
on  most  computers.  All  comparisons  during  a given  stage  are  made  against  the 
same  key,  so  this  key  may  be  kept  in  a register.  Only  a single  index  needs  to 
be  changed  between  comparisons.  Furthermore,  the  amount  of  data  movement 
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Table  2 

QUICKSORTING 


(Z,r)  Stack 


[503 

087 

512 

061 

908 

170 

897 

275 

653 

426 

154 

509 

612 

677 

765 

703] 

(1-16) 

[275 

087 

154 

061 

426 

170] 

503 

[897 

653 

908 

512 

509 

612 

677 

765 

703] 

(1.6) 

(8,16) 

[170 

087 

154 

061] 

275 

426 

503 

[897 

653 

908 

512 

509 

612 

677 

765 

703] 

(1.4) 

(8,16) 

[061 

087 

154J 

170 

275 

426 

503 

[897 

653 

908 

512 

509 

612 

677 

765 

703] 

(1.3) 

(8,16) 

061 

[087 

154] 

170 

275 

426 

503 

[897 

653 

908 

512 

509 

612 

677 

765 

703] 

(2,3) 

(8,16) 

061 

087 

154 

170 

275 

426 

503 

[897 

653 

908 

512 

509 

612 

677 

765 

703] 

(8,16) 

— 

061 

087 

154 

170 

275 

426 

503 

[765 

653 

703 

512 

509 

612 

677] 

897 

908 

(8,14) 

— 

061 

087 

154 

170 

275 

426 

503 

[677 

653 

703 

512 

509 

612] 

765 

897 

908 

(8-13) 

— 

061 

087 

154 

170 

275 

426 

503 

[509 

653 

612 

512] 

677 

703 

765 

897 

908 

(8,11) 

— 

061 

087 

154 

170 

275 

426 

503 

509 

[653 

612 

512] 

677 

703 

765 

897 

908 

(9,11) 

— 

061 

087 

154 

170 

275 

426 

503 

509 

[512 

612] 

653 

677 

703 

765 

897 

908 

(9,10) 

— 

061 

087 

154 

170 

275 

426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908 

- 

- 

is  quite  reasonable;  the  computation  in  Table  2,  for  example,  makes  only  17 
exchanges. 

The  bookkeeping  required  to  control  i,  j,  and  the  stack  is  not  difficult,  but 
it  makes  the  quicksort  partitioning  procedure  most  suitable  for  fairly  large  N. 
Therefore  the  following  algorithm  uses  another  strategy  after  the  subfiles  have 
become  short. 

Algorithm  Q ( Quicksort ).  Records  Rx, . . . , R jv  are  rearranged  in  place;  after 
sorting  is  complete  their  keys  will  be  in  order,  Kx  < ■ ■ ■ < KN.  An  auxiliary 
stack  with  at  most  [lg  IV  J entries  is  needed  for  temporary  storage.  This  algorithm 
follows  the  quicksort  partitioning  procedure  described  in  the  text  above,  with 
slight  modifications  for  extra  efficiency: 

a)  We  assume  the  presence  of  artificial  keys  K0  = — oo  and  Kn+i  = +oc  such 
that 

Ko  < Ki  < Abv+i  for  1 < i < N.  (13) 

(Equality  is  allowed.) 

b)  Subfiles  of  M or  fewer  elements  are  left  unsorted  until  the  very  end  of  the 
procedure;  then  a single  pass  of  straight  insertion  is  used  to  produce  the  final 
ordering.  Here  M > 1 is  a parameter  that  should  be  chosen  as  described  in 
the  text  below.  (This  idea,  due  to  R.  Sedgewick,  saves  some  of  the  overhead 
that  would  be  necessary  if  we  applied  straight  insertion  directly  to  each  small 
subfile,  unless  locality  of  reference  is  significant.) 

c)  Records  with  equal  keys  are  exchanged,  although  it  is  not  strictly  necessary 
to  do  so.  (This  idea,  due  to  R.  C.  Singleton,  keeps  the  inner  loops  fast  and 
helps  to  split  subfiles  nearly  in  half  when  equal  elements  are  present;  see 
exercise  18.) 

Ql.  [Initialize.]  If  N < M , go  to  step  Q9.  Otherwise  set  the  stack  empty,  and 
set  l «—  1,  r N. 
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Fig.  19.  Partition-exchange  sorting  (quicksort). 

Q2.  [Begin  new  stage.]  (We  now  wish  to  sort  the  subfile  Ri . . . Rr : from  the 
nature  of  the  algorithm,  we  have  r > l + M,  and  A)_ 1 < Ki  < Kr+i  for 
l < i < r.)  Set  i 4—  l,  j <—  r + 1;  and  set  K -f-  Ki.  (The  text  below  discusses 
alternative  choices  for  K that  might  be  better.) 

Q3.  [Compare  Ki : K.}  (At  this  point  the  file  has  been  rearranged  so  that 

Kk  < K for  l — 1 < k < i,  K < Kk  for  j < k < r + 1;  (14) 

and  l < i < j.)  Increase  i by  1;  then  if  A',  < AT,  repeat  this  step.  (Since 
Kj  > K,  the  iteration  must  terminate  with  i < j.) 

Q4.  [Compare  K :Kj .]  Decrease  j by  1;  then  if  K < Kj,  repeat  this  step.  (Since 
K > Ki-\,  the  iteration  must  terminate  with  j > i — 1.) 

Q5.  [Test  i:j.}  (At  this  point,  (14)  holds  except  for  k = i and  k = j;  also 

Ki  > K > Kj , and  r > j > i — 1 > l.)  If  j < i,  interchange  Ri  Rj  and 

go  to  step  Q7. 

Q6.  [Exchange.]  Interchange  R,t  O Rj  and  go  back  to  step  Q3. 

Q7.  [Put  on  stack.]  (Now  the  subfile  Ri ...  Rj  ...  Rr  has  been  partitioned  so 
that  Kk  < Kj  for  Z — 1 < k < j and  Kj  < Kk  for  j < k < r + 1.)  If 
1 — j > j — l > M,  insert  (j’+l,  r ) on  top  of  the  stack,  set  r «—  j — 1,  and  go 
to  Q2.  If  j — l > r — j > M,  insert  (l,j  — 1)  on  top  of  the  stack,  set  l <—  j + 1, 
and  go  to  Q2.  (Each  entry  (a,  b)  on  the  stack  is  a request  to  sort  the  subfile 
Ra  ...  Rb  at  some  future  time.)  Otherwise  if  r — j > M > j — l,  set  l «—  j + 1 
and  go  to  Q2;  or  if  j — l > M > r — j,  set  r <—  j — 1 and  go  to  Q2. 

Q8.  [Take  off  stack.]  If  the  stack  is  nonempty,  remove  its  top  entry  (/'.  r'),  set 
l 4—  l',  r <—  r',  and  return  to  step  Q2. 

Q9.  [Straight  insertion  sort.]  For  j = 2,  3,  . . . , A)  if  Ay_i  > Kj  do  the  following 

operations:  Set  K 4—  Kj,  R 4—  Rj,  i 4—  j — 1;  then  set  R,  + 1 4 — R.,  and 

i 4—  i — 1 one  or  more  times  until  Ki  < K;  then  set  Ri+\  4—  R.  (This 
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is  Algorithm  5. 2. IS,  modified  as  suggested  in  exercise  5.2.1-10  and  answer 
5.2.1-33.  Step  Q9  may  be  omitted  if  M = 1.  Caution:  The  final  straight 
insertion  might  conceal  bugs  in  steps  Q1-Q8;  don’t  trust  an  implementation 
just  because  it  gives  the  correct  answers!)  | 

The  corresponding  MIX  program  is  rather  long,  but  not  complicated;  in  fact, 
a large  part  of  the  coding  is  devoted  to  step  Q7,  which  just  fools  around  with 
the  variables  in  a very  straightforward  way. 

Program  Q (Quicksort).  Records  to  be  sorted  appear  in  locations  INPUT+1 
through  INPUT+N;  assume  that  locations  INPUT  and  INPUT+N+1  contain,  respec- 
tively, the  smallest  and  largest  values  possible  in  MIX.  The  stack  is  kept  in 
locations  STACK+1,  STACK+2,  . . . ; see  exercise  20  for  the  exact  number  of  locations 
to  set  aside  for  the  stack.  rI2  = l,  rI3  = r,  rI4  = i,  rI5  = j,  rI6  = size  of  stack, 


rA 

= K = 

R.  We 

i assume  that  N > M. 

A 

EQU 

2:3 

First  component  of  stack  entry. 

B 

EQU 

4:5 

Second  component  of  stack  entry. 

01 

START 

ENT6 

0 

1 

Ql.  Initialize.  Set  stack  empty. 

02 

ENT2 

1 

1 

1 <-  1. 

03 

ENT3 

N 

1 

r <—  N. 

04 

2H 

ENT5 

1,3 

A 

0.2.  Begin  new  stage,  i «—  r + 1. 

05 

LDA 

INPUT ,2 

A 

K 4-  Ki. 

06 

ENT4 

1,2 

A 

i 4 — / T 1 . 

07 

JMP 

OF 

A 

To  Q3  omitting  “i  «—  i + 1” . 

08 

6H 

LDX 

INPUT, 4 

B 

06.  Exchange. 

09 

ENT1 

INPUT ,4 

B 

10 

MOVE 

INPUT, 5 

B 

11 

STX 

INPUT, 5 

B 

Ri  ^ Rj . 

12 

3H 

INC4 

1 

C'  - A 

Q3.  Compare  Kn  K.  i i + 1. 

13 

OH 

CMPA 

INPUT, 4 

C' 

14 

JG 

3B 

C' 

Repeat  if  A > Ki. 

15 

4H 

DEC5 

1 

C-C 

04.  Compare  K : K , . i +-  ? — 1. 

16 

CMPA 

INPUT, 5 

C-C 

17 

JL 

4B 

C-C' 

Repeat  if  K < Kj. 

18 

5H 

ENTX 

0,5 

B + A 

0.5.  Test  i : i. 

19 

DECX 

0,4 

B + A 

20 

JXP 

6B 

B + A 

To  Q6  if  j > i. 

21 

LDX 

INPUT, 5 

A 

22 

STX 

INPUT, 2 

A 

Ri  4—  Rj  • 

23 

STA 

INPUT , 5 

A 

Rj  i — R. 

24 

7H 

ENT4 

0,3 

A 

Q7.  Put  on  stack. 

25 

DEC4 

M,  5 

A 

rI4  r — j — M. 

26 

ENT1 

0,5 

A 

27 

DEC1 

M,  2 

A 

rll  4—  j — l — M. 

28 

ENTA 

0,4 

A 

29 

DECA 

0,1 

A 

30 

JANN 

IF 

A 

Jump  if  r — j > j — l. 

31 

J1NP 

8F 

A' 

To  Q8  if  M > j — l > r — j . 

32 

J4NP 

3F 

S'  + A" 

Jump  if  j — l>M>r  — j. 
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33 

INC6 

1 

S' 

(Now  j - l > r - j > M.) 

34 

ST2 

STACK, 6 (A) 

S' 

35 

ENTA 

-1,5 

S' 

36 

STA 

STACK, 6(B) 

S' 

( l , j — 1)  =>  stack. 

37 

4H 

ENT2 

1,5 

S'  + A'" 

li-j  + l. 

38 

JMP 

2B 

S'  + A'" 

To  Q2. 

39 

1H 

J4NP 

8F 

A- A' 

To  Q8  if  M > r — j > j — l. 

40 

J1NP 

4B 

S - S'  + A"' 

Jump  if  r — j > M > j — l. 

41 

INC6 

1 

S -S' 

(Now  r — j > j — l > M.) 

42 

ST3 

STACK, 6(B) 

S - S' 

43 

ENTA 

1,5 

S- S' 

44 

STA 

STACK, 6 (A) 

S- S' 

(j+1,  r ) =+  stack. 

45 

3H 

ENT3 

-1,5 

S - S'  + A" 

r <r~  j — 1. 

46 

JMP 

2B 

S - S'  + A" 

To  Q2. 

47 

8H 

LD2 

STACK, 6 (A) 

5+1 

Q8.  Take  off  stack. 

48 

LD3 

STACK, 6(B) 

5 + 1 

49 

DEC6 

1 

5+1 

( l , r)  <+  stack. 

50 

J6NN 

2B 

5 + 1 

To  Q2  if  stack  wasn’t  empty. 

51 

9H 

ENT5 

2-N 

1 

Q9.  Straight  insertion  sort,  i <—  2 

52 

2H 

LDA 

INPUT+N.5 

N - 1 

K t — Kj,  R t — Rj . 

53 

CMP  A 

INPUT+N-1 , 5 

N — 1 

(In  this  loop,  rI5  = j — N.) 

54 

JGE 

6F 

N - 1 

Jump  if  K > Kj-\. 

55 

3H 

ENT4 

N-1,5 

D 

i +-  j ~ 1. 

56 

4H 

LDX 

INPUT, 4 

E 

57 

STX 

INPUT+1 ,4 

E 

Ri  + l ^ Ri  ■ 

58 

DEC4 

1 

E 

i «—  i — 1. 

59 

CMPA 

INPUT, 4 

E 

60 

JL 

4B 

E 

Repeat  if  K < K,  . 

61 

5H 

STA 

INPUT+1, 4 

D 

Ri  + l i — R. 

62 

6H 

INC5 

1 

N-  1 

63 

J5NP 

2B 

N-  1 

2 < j < N.  | 

Analysis  of  quicksort.  The  timing  information  shown  with  Program  Q is  not 
hard  to  derive  using  Kirchhoff’s  conservation  law  (Section  1.3.3)  and  the  fact 
that  everything  put  onto  the  stack  is  eventually  removed  again.  Kirchhoff’s  law 
applied  at  Q2  also  shows  that 

A = 1 + (S"  + A"')  + (S-S'  + A")  + S = 2S  + 1 + A"  + A'",  (15) 

hence  the  total  running  time  comes  to 

24A  + 11 B + 4C  + 3D  + 8E  + 7N  + 95  units, 

where 

A = number  of  partitioning  stages; 

B = number  of  exchanges  in  step  Q6; 

C = number  of  comparisons  made  while  partitioning; 

D = number  of  times  Kj  _ j > Kj  during  straight  insertion  (step  Q9); 

E = number  of  inversions  removed  by  straight  insertion; 

S = number  of  times  an  entry  is  put  on  the  stack.  (16) 
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By  analyzing  these  six  quantities,  we  will  be  able  to  make  an  intelligent  choice  of 
the  parameter  M that  specifies  the  “threshold”  between  straight  insertion  and 
partitioning.  The  analysis  is  particularly  instructive  because  the  algorithm  is 
rather  complex;  the  unraveling  of  this  complexity  makes  a particularly  good 
illustration  of  important  techniques.  However,  nonmathematical  readers  are 
advised  to  skip  to  Eq.  (25). 

As  in  most  other  analyses  of  this  chapter,  we  shall  assume  that  the  keys  to 
be  sorted  are  distinct;  exercise  18  indicates  that  equalities  between  keys  do  not 
seriously  harm  the  efficiency  of  Algorithm  Q,  and  in  fact  they  seem  to  help  it. 
Since  the  method  depends  only  on  the  relative  order  of  the  keys,  we  may  as  well 
assume  that  they  are  simply  {1,2,...,  N}  in  some  order. 

We  can  attack  this  problem  by  considering  the  behavior  of  the  very  first 
partitioning  stage,  which  takes  us  to  Q7  for  the  first  time.  Once  this  partitioning 
has  been  achieved,  both  of  the  subfiles  Ri  . . . Rj-i  and  Rj+i  ■ . . Rn  will  be  in 
random  order  if  the  original  file  was  in  random  order,  since  the  relative  order  of 
elements  in  these  subfiles  has  no  effect  on  the  partitioning  algorithm.  Therefore 
the  contribution  of  subsequent  partitionings  can  be  determined  by  induction 
on  N.  (This  is  an  important  observation,  since  some  alternative  algorithms  that 
violate  this  property  have  turned  out  to  be  significantly  slower;  see  Computing 
Surveys  6 (1974),  287-289.) 

Let  s be  the  value  of  the  first  key,  K\,  and  assume  that  exactly  t of  the  first  s 
keys  {K\ , . . . , Ks  } are  greater  than  s.  (Remember  that  the  keys  being  sorted  are 
the  integers  {1,2,...,  N}.)  If  s — 1,  it  is  easy  to  see  what  happens  during  the 
first  stage  of  partitioning:  Step  Q3  is  performed  once,  step  Q4  is  performed  N 
times,  and  then  step  Q5  takes  us  to  Q7.  So  the  contributions  of  the  first  stage  in 
this  case  are  A = 1,  B = 0,  C = N + 1.  A similar  but  slightly  more  complicated 
argument  when  s > 1 (see  exercise  21)  shows  that  the  contributions  of  the  first 
stage  to  the  total  running  time  are,  in  general, 

A = 1,  B = t,  C = N + 1,  for  1 < s < N.  (17) 

To  this  we  must  add  the  contributions  of  the  later  stages,  which  sort  subfiles  of 
s — 1 and  N — s elements,  respectively. 

If  we  assume  that  the  original  file  is  in  random  order,  it  is  now  possible 
to  write  down  formulas  that  define  the  generating  functions  for  the  probability 
distributions  of  A,  B, . . . , S (see  exercise  22).  But  for  simplicity  we  shall  consider 
here  only  the  average  values  of  these  quantities,  An,  Bn,  ■ ■ ■ , Sn,  as  functions 
of  N.  Consider,  for  example,  the  average  number  of  comparisons,  Cn,  that  occur 
during  the  partitioning  process.  When  N < M,  Cn  = 0.  Otherwise,  since  any 
given  value  of  s occurs  with  probability  1/N,  we  have 

1 N 

Cn  = (N  + 1 + (Vi  + Cjv-s) 

S = 1 

= N+1  + ^ E for  N > M.  (18) 

0 <k<N 
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Similar  formulas  hold  for  other  quantities  AN,  BN,  DN,  EN,  SN  (see  exercise  23). 
There  is  a simple  way  to  solve  recurrence  relations  of  the  form 

2 


— fn  + ^ 

n. 


for  n > 


(19) 


0 <fc<n 


The  first  step  is  to  get  rid  of  the  summation  sign:  Since 

(n+  l)xn+1  = (n+  l)/n+1  + 2 ^2  xk , 

0 </c<n 


nxn 


7lfn  “b  2 ^ ^ X/j, 

0 <k<n 

we  may  subtract,  obtaining 

(n  + l)xn+1  - nx„  - gn  + 2xn,  where  gn  = (n  + 1 )/n+1  - n/„ 
Now  the  recurrence  takes  the  much  simpler  form 

(n  + l)xn+i  = (n  + 2)x„  + gn,  for  n>m. 

Any  recurrence  relation  that  has  the  general  form 

1 — bnXn  T gn 


(20) 


(2l) 


can  be  reduced  to  a summation  if  we  multiply  both  sides  by  the  “summation 
factor”  cio  a\  . . . an-i/bo  bi  . . . bn ; we  obtain 


■ i • • • &n— 1 

2/n+i  = Vn  + c„,  where  yn  = — ; x„ 


do • • • ®n— 1 / \ 

Cn  = — 7—  (22) 


bo...bn- 1 6„  61  . . . 6n 

In  our  case  (20),  the  summation  factor  is  simply  n!/(n  + 2)!  = l/(n  + l)(n  + 2), 
so  we  find  that  the  simple  relation 

(n  + l)/„+i  - nfn 


ln-\- 1 


n + 1 


for  n > m. 


(23) 


(n  + l)(n  + 2) 

is  a consequence  of  (19). 

For  example,  if  we  set  /„  = 1/n,  we  get  the  unexpected  result  x„/(n  + 1)  = 
xml (m  + 1)  for  all  n > m.  If  we  set  /„  = n + 1,  we  get 

x„/(n+  1)  = 2/(n  + 1)  + 2/nd h 2/(?n  + 2)  + xm/{m  + 1) 

^ (Hn+1  Hm-\- 1 ) T xm / (m  + 1 ) , 

for  all  n > m.  Thus  we  obtain  the  solution  to  (18)  by  setting  m — M + 1 and 
xn  = 0 for  n < M;  the  required  formula  is 

Cjv  = (IV  + 1)  (2Hn+i  — 2Hm+2  + 1) 

N+l' 


2 (N  + 1)  In 


M + 2 


for  N > M. 


(24) 


Exercise  6. 2. 2-8  proves  that,  when  M — 1,  the  standard  deviation  of  Cy  is 
asymptotically  ^ / (21  — 2ir2)/3  N;  this  is  reasonably  small  compared  to  (24). 
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The  other  quantities  can  be  found  in  a similar  way  (see  exercise  23);  when 
N > M we  have 


An=2(N+1)/{M  + 2)  - 1, 

Bn  = ±(N+  1) (2 Hn+1  - 2Hm+2  + 1 - 6/(M  + 2))  + |, 

Dn  = (N  + 1)  (1  - 2Hm+1/(M  + 2)) , 

En  = i (N  + 1 )M(M  - 1 )/(M  + 2); 

SN  = (N  + 1)/(2M  + 3)  - 1,  for  TV  > 2M+  1.  (25) 

The  discussion  above  shows  that  it  is  possible  to  carry  out  an  exact  analysis 
of  the  average  running  time  of  a fairly  complex  program,  by  using  techniques 
that  we  have  previously  applied  only  to  simpler  cases. 

Formulas  (24)  and  (25)  can  be  used  to  determine  the  best  value  of  M on  a 
particular  computer.  In  Mix’s  case,  Program  Q requires  (35/3 )(N  + l)HN+l  + 
|(7V  + l)f(M)  - 34.5  units  of  time  on  the  average,  for  N > 2 M + 1,  where 


f{M)  = 8 M 


70Hm+2  + 71-36 


Hm+i 
M + 2 


+ 


270 
M + 2 


54 

2M  + 3 ' 


(26) 


We  want  to  choose  M so  that  f(M)  is  a minimum,  and  a simple  computer 
calculation  shows  that  M — 9 is  best.  The  average  running  time  of  Program  Q 
is  approximately  11.667 (N  + 1)  In  IV  - 1.741V  - 18.74  units  when  M — 9,  for 
large  N. 

So  Program  Q is  quite  fast,  on  the  average,  considering  that  it  requires  very 
little  memory  space.  Its  speed  is  primarily  due  to  the  fact  that  the  inner  loops, 
in  steps  Q3  and  Q4,  are  extremely  short  — only  three  MIX  instructions  each  (see 
lines  12-14  and  15-17).  The  number  of  exchanges,  in  step  Q6,  is  only  about 
1/6  of  the  number  of  comparisons  in  steps  Q3  and  Q4;  hence  we  have  saved  a 
significant  amount  of  time  by  not  comparing  i to  j in  the  inner  loops. 

But  what  is  the  worst  case  of  Algorithm  Q?  Are  there  some  inputs  that  it 
does  not  handle  efficiently?  The  answer  to  this  question  is  quite  embarrassing: 
If  the  original  file  is  already  in  order,  with  K±  < K2  < ■■  ■ < KN,  each 
"partitioning”  operation  is  almost  useless,  since  it  reduces  the  size  of  the  subfile 
by  only  one  element!  So  this  situation  (which  ought  to  be  easiest  of  all  to  sort) 
makes  quicksort  anything  but  quick;  the  sorting  time  becomes  proportional  to 
N2  instead  of  IVlgJV.  (See  exercise  25.)  Unlike  the  other  sorting  methods  we 
have  seen,  Algorithm  Q likes  a disordered  file. 

Hoare  suggested  two  ways  to  remedy  the  situation,  in  his  original  paper,  by 
choosing  a better  value  of  the  test  key  K that  governs  the  partitioning.  One  of 
his  recommendations  was  to  choose  a random  integer  q between  l and  r in  the 
last  part  of  step  Q2;  we  can  change  the  instruction  “A  «—  K”  to 


K •<—  Kq , R «—  Rq,  Rq  Ri , Ri  <r-  R (27) 

in  that  step.  (The  last  assignment  “A;  <—  R:  is  necessary;  otherwise  step  Q4 
would  stop  with  j — l — 1 when  K is  the  smallest  key  of  the  subfile  being 
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partitioned.)  According  to  Eqs.  (25),  such  random  integers  need  to  be  calculated 
only  2 (TV  + 1)/ (M  + 2)  - 1 times  on  the  average,  so  the  additional  running  time 
is  not  substantial;  and  the  random  choice  gives  good  protection  against  the 
occurrence  of  the  worst  case.  Even  a mildly  random  choice  of  q should  be  safe. 
Exercise  42  proves  that,  with  truly  random  q , the  probability  of  more  than,  say, 
20TVlnTV  comparisons  will  surely  be  less  than  10~8. 

Hoare’s  second  suggestion  was  to  look  at  a small  sample  of  the  file  and  to 
choose  a median  value  of  the  sample.  This  approach  was  adopted  by  R.  C. 
Singleton  [CACM  12  (1969),  185-187],  who  suggested  letting  Kq  be  the  median 
of  the  three  values 

K[(l+r)/2i,  Kr.  (28) 

Singleton’s  procedure  cuts  the  number  of  comparisons  down  from  27V  In  TV  to 
about  -y  TV  In  TV  (see  exercise  29).  It  can  be  shown  that  FJ\-  is  asymptotically 
Cn/ 5 instead  of  Cjv/ 6 in  this  case,  so  the  median  method  slightly  increases  the 
amount  of  time  spent  in  transferring  the  data;  the  total  running  time  therefore 
decreases  by  roughly  8 percent.  (See  exercise  56  for  a detailed  analysis.)  The 
worst  case  is  still  of  order  TV  , but  such  slow  behavior  will  hardly  ever  occur. 

W.  D.  Frazer  and  A.  C.  McKellar  [JACM  17  (1970),  496-507]  have  suggested 
taking  a much  larger  sample  consisting  of  2k  - 1 records,  where  k is  chosen  so 
that  2k  « TV/ In  TV.  The  sample  can  be  sorted  by  the  usual  quicksort  method, 
then  inserted  among  the  remaining  records  by  taking  k passes  over  the  hie 
(partitioning  it  into  2k  subfiles,  bounded  by  the  elements  of  the  sample).  Finally 
the  subfiles  are  sorted.  The  average  number  of  comparisons  required  by  such 
a samplesort  procedure  is  about  the  same  as  in  Singleton’s  median  method, 
when  TV  is  in  a practical  range,  but  it  decreases  to  the  asymptotic  value  TV  lg  TV 
as  TV  — > 00. 

An  absolute  guarantee  of  0(TV  log  TV)  sorting  time  in  the  worst  case,  together 
with  fast  running  time  on  the  average,  can  be  obtained  by  combining  quicksort 
with  other  schemes.  For  example,  D.  R.  Musser  [Software  Practice  k Exper.  27 
(1997),  983-993]  has  suggested  adding  a “depth  of  partitioning”  component  to 
each  entry  on  quicksort’s  stack.  If  any  subfile  is  found  to  have  been  subdivided 
more  than,  say,  2 lg  TV  times,  we  can  abandon  Algorithm  Q and  switch  to  Al- 
gorithm 5.2.3H.  The  inner  loop  time  remains  unchanged,  so  the  average  total 
running  time  remains  almost  the  same  as  before. 

Robert  Sedgewick  has  analyzed  a number  of  optimized  variants  of  quicksort 

in  Acta  Informatica  7 (1977),  327-356,  and  in  CACM  21  (1978),  847 857. 

22  (1979),  368.  See  also  J.  L.  Bentley  and  M.  D.  Mcllroy,  Software  Practice 
k Exper.  23  (1993),  1249-1265,  for  a version  of  quicksort  that  has  been  tuned 
up  to  fit  the  UNIX®  software  library,  based  on  15  further  years  of  experience. 

Radix  exchange.  We  come  now  to  a method  that  is  quite  different  from 
any  of  the  sorting  schemes  we  have  seen  before;  it  makes  use  of  the  binary 
representation  of  the  keys,  so  it  is  intended  only  for  binary  computers.  Instead 
of  comparing  two  keys  with  each  other,  this  method  inspects  individual  bits  of 
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the  keys,  to  see  if  they  are  0 or  1.  In  other  respects  it  has  the  characteristics  of 
exchange  sorting,  and,  in  fact,  it  is  rather  similar  to  quicksort.  Since  it  depends 
on  radix  2 representations,  we  call  it  “radix  exchange  sorting.”  The  algorithm 
can  be  described  roughly  as  follows: 

i)  Sort  the  sequence  on  its  most  significant  binary  bit,  so  that  all  keys  that 
have  a leading  0 come  before  all  keys  that  have  a leading  1.  This  sorting  is  done 
by  finding  the  leftmost  key  Ki  that  has  a leading  1,  and  the  rightmost  key  Kj 
with  a leading  0.  Then  Ri  and  Rj  are  exchanged  and  the  process  is  repeated 
until  i > j. 

ii)  Let  F0  be  the  elements  with  leading  bit  0,  and  let  Fx  be  the  others.  Apply 
the  radix  exchange  sorting  method  to  F0  (starting  now  at  the  second  bit  from 
the  left  instead  of  the  most  significant  bit),  until  F0  is  completely  sorted;  then 
do  the  same  for  F\. 

For  example,  Table  3 shows  how  the  radix  exchange  sort  acts  on  our  16 
random  numbers,  which  have  been  converted  to  octal  notation.  Stage  1 in  the 
table  shows  the  initial  input,  and  after  exchanging  on  the  first,  bit  we  get  to 
stage  2.  Stage  2 sorts  the  first  group  on  bit  2,  and  stage  3 works  on  bit  3.  (The 
reader  should  mentally  convert  the  octal  notation  to  10-bit  binary  numbers.  For 
example,  0232  stands  for  (0  010  011  010)2.)  When  we  reach  stage  5,  after  sorting 
on  bit  4,  we  find  that  each  group  remaining  has  but  a single  element,  so  this  part 
of  the  file  need  not  be  further  examined.  The  notation  “4[0232  0252]”  means 
that  the  subfile  0232  0252  is  waiting  to  be  sorted  on  bit  4 from  the  left.  In  this 
particular  case,  no  progress  occurs  when  sorting  on  bit  4;  we  need  to  go  to  bit  5 
before  the  items  are  separated. 

The  complete  sorting  process  shown  in  Table  3 takes  22  stages,  somewhat 
more  than  the  comparable  number  for  quicksort  (Table  2).  Similarly,  the  number 
of  bit  inspections,  82,  is  rather  high;  but  we  shall  see  that  the  number  of  bit 
inspections  for  large  N is  actually  less  than  the  number  of  comparisons  made 
by  quicksort,  assuming  a uniform  distribution  of  keys.  The  total  number  of 
exchanges  in  Table  3 is  17,  which  is  quite  reasonable.  Note  that  bit  inspections 
never  have  to  go  past  bit  7 here,  although  10-bit  numbers  are  being  sorted. 

As  in  quicksort,  we  can  use  a stack  to  keep  track  of  the  “boundary  line 
information”  for  waiting  subfiles.  Instead  of  sorting  the  smallest  subfile  first,  it 
is  convenient  simply  to  go  from  left  to  right,  since  the  stack  size  in  this  case 
can  never  exceed  the  number  of  bits  in  the  keys  being  sorted.  In  the  following 
algorithm  the  stack  entry  (r,  b)  is  used  to  indicate  the  right  boundary  r of  a 
subfile  waiting  to  be  sorted  on  bit  b;  the  left  boundary  need  not  actually  be 
recorded  in  the  stack  — it  is  implicit  because  of  the  left-to-right  nature  of  the 
procedure. 

Algorithm  R ( Radix  exchange  sort).  Records  Ri,.  ..,Rn  are  rearranged  in 
place;  after  sorting  is  complete,  their  keys  will  be  in  order,  Kx  < ■ ■ ■ < KN.  Each 
key  is  assumed  to  be  a nonnegative  m-bit  binary  number,  a2  . . . am) 2;  the  ith 
most  significant  bit,  oq,  is  called  “bit  i”  of  the  key.  An  auxiliary  stack  with 
room  for  at  most  m — 1 entries  is  needed  for  temporary  storage.  This  algorithm 
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The  radix  exchange  method  looks  precisely  once  at  every  bit  that  is  needed  to  determine  the  final  order  of  the  keys. 
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essentially  follows  the  radix  exchange  partitioning  procedure  described  in  the 
text  above;  certain  improvements  in  its  efficiency  are  possible,  as  described  in 
the  text  and  exercises  below. 

Rl.  [Initialize.]  Set  the  stack  empty,  and  set  l «—  1,  r «—  IV,  b 4—  1. 

R2.  [Begin  new  stage.]  (We  now  wish  to  sort  the  subfile  Ri  . . . Rr  on  bit  6; 
from  the  nature  of  the  algorithm,  we  have  l < r.)  If  l — r,  go  to  step  RIO 
(since  a one- word  file  is  already  sorted).  Otherwise  set  i <—  l,  j r. 

R3.  [Inspect  Kj  for  1.]  Examine  bit  b of  Kt.  If  it  is  a 1,  go  to  step  R6. 

R4.  [Increase  *.]  Increase  i by  1.  If  i < j , return  to  step  R3;  otherwise  go  to 

step  R.8. 

R5.  [Inspect  Kj+ 1 for  0.]  Examine  bit  b of  Kj+ If  it  is  a 0,  go  to  step  R7. 

R6.  [Decrease  j.}  Decrease  j by  1.  If  i < j,  go  to  step  R5;  otherwise  go  to 

step  R8. 

R7.  [Exchange  Ri,  Rj+i]  Interchange  records  R,  <->  Rj+i‘,  then  go  to  step  R4. 

R8.  [Test  special  cases.]  (At  this  point  a partitioning  stage  has  been  completed; 
i = j + 1,  bit  b of  keys  Ki, . . . ,Kj  is  0,  and  bit  b of  keys  Ki, ... , Kr  is  1.) 
Increase  6 by  1.  If  b > m,  where  m is  the  total  number  of  bits  in  the  keys, 
go  to  step  RIO.  (In  such  a case,  the  subfile  R[ . . . R,  has  been  sorted.  This 
test  need  not  be  made  if  there  is  no  chance  of  having  equal  keys  present  in 
the  file.)  Otherwise  if  j < l or  j = r,  go  back  to  step  R2  (all  bits  examined 
were  1 or  0,  respectively).  Otherwise  if  j — l,  increase  l by  1 and  go  to 
step  R2  (there  was  only  one  0 bit). 

R9.  [Put  on  stack.]  Insert  the  entry  (r,  b)  on  top  of  the  stack;  then  set  r j 
and  go  to  step  R2. 

RIO.  [Take  off  stack.]  If  the  stack  is  empty,  we  are  done  sorting;  otherwise  set 
l -f-  r + 1,  remove  the  top  entry  (r',  b ')  of  the  stack,  set  r <—  r\  b b' , and 
return  to  step  R2.  | 

Program  R ( Radix  exchange  sort).  The  following  MIX  code  uses  essentially  the 
same  conventions  as  Program  Q.  We  have  rll  = l — r,  rI2  = r,  rI3  = i , rI4  = j, 
rI5  = rn  — b.  rI6  = size  of  stack,  except  that  it  proves  convenient  for  certain 
instructions  (designated  below)  to  leave  rI3  = i — j or  rI4  = j — i.  Because  of 
the  binary  nature  of  radix  exchange,  this  program  uses  the  operations  SRB  (shift 
right  AX  binary),  JAE  (jump  A even),  and  JAO  (jump  A odd),  defined  in  Section 
4.5.2.  We  assume  that  N > 2. 


01 

START  ENT6 

0 

1 

Rl.  Initialize.  Set  stack  empty. 

02 

ENT1 

1-N 

1 

1 <r~  1. 

03 

ENT  2 

N 

1 

r <—  N. 

04 

ENT5 

M-l 

1 

b<-l. 

05 

JMP 

IF 

1 

To  R2  (omit  testing  l = r). 

06 

9H  INC6 

1 

S 

R9.  Put  on  stack.  [rI4 

07 

ST2 

STACK, 6 (A) 

S 

08 

ST5 

STACK, 6(B) 

s 

(r,  b)  =>  stack. 
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09 

ENN1 

0,4 

5 

rll  «—  l — j. 

10 

ENT2 

-1,3 

S 

r +-  j. 

11 

1H 

ENT3 

0,1 

A 

R2.  Begin  new  stage.  Irl3  = i — 7] 

12 

ENT4 

0,2 

A 

i <— l,  j <— r.  [rI3  = i—  j] 

13 

3H 

INC3 

0,4 

C' 

R3.  Inspect  Ki  for  1. 

14 

LDA 

INPUT, 3 

C' 

15 

SEB 

0,5 

C' 

units  bit  of  rA  +-  bit  b of  Kt. 

16 

JAE 

4F 

C' 

To  R4  if  it  is  0. 

17 

6H 

DEC4 

1,3 

C"  + X 

R6.  Decrease  i.  i <—  7 — 1.  frI4=  i — i] 

18 

J4N 

8F 

C"  + X 

To  R8  if  j < i.  [rI4  = j — i] 

19 

5H 

INC4 

0,3 

C" 

R5.  Inspect  Kj  4-1  for  0. 

20 

LDA 

INPUT+1,4 

C" 

21 

SRB 

0,5 

c" 

units  bit  of  rA  4—  bit  b of  Kj+ 1. 

22 

JAO 

6B 

c" 

To  R6  if  it  is  1. 

23 

7H 

LDA 

INPUT+1,4 

B 

R7.  Exchange  R, , 

24 

LDX 

INPUT, 3 

B 

25 

STX 

INPUT+1,4 

B 

26 

STA 

INPUT, 3 

B 

27 

4H 

DEC3 

-1,4 

C'  -X 

R4.  Increase  i.  i +-  * + 1.  [rI3  = i—  ?| 

28 

J3NP 

3B 

C'  -X 

To  R3  if  i < j.  [rI3  = i—j] 

29 

INC3 

0,4 

A-X 

rI3  <—  i. 

30 

8H 

J5Z 

OF 

A 

R8.  Test  special  cases.  [rI4  unknown] 

31 

DEC5 

1 

A-G 

To  R10  if  b = m,  else  £>  + — 6+1. 

32 

ENT4 

-1,3 

A-G 

rI4  +-  j. 

S3 

DEC4 

0,2 

A-G 

rI4  j — r. 

34 

J4Z 

IB 

A-G 

To  R2  if  j = r. 

35 

DEC4 

0,1 

A-G-R 

rI4  «—  j — l. 

36 

J4N 

IB 

A-G- R 

To  R2  if  j < l. 

37 

J4NZ 

9B 

A-G-L-R 

To  R9  if  j / l. 

38 

INC1 

1 

K 

Z+-Z  + 1. 

39 

2H 

J1NZ 

IB 

K + S 

Jump  if  Z / r. 

40 

OH 

ENT1 

1,2 

5 + 1 

R10.  Take  off  stack. 

41 

LD2 

STACK, 6 (A) 

5+1 

42 

DEC1 

0,2 

5 + 1 

43 

LD5 

STACK, 6(B) 

5 + 1 

stack  =>  ( r , b). 

44 

DEC6 

1 

5+1 

45 

J6NN 

2B 

5+1 

To  R2  if  stack  was  nonempty.  | 

The  running  time  of  this  radix  exchange  program  depends  on 


A = number  of  stages  encountered  with  l < r; 

B = number  of  exchanges; 

C = C'  + C"  = number  of  bit  inspections; 

G = number  of  times  b > m in  step  R8; 

K — number  of  times  b < m,  j — l in  step  R8;  (29) 

L = number  of  times  b < m,  j < l in  step  R8; 

R — number  of  times  b < m,  j = r in  step  R8; 

S = number  of  times  things  are  entered  onto  the  stack; 

X = number  of  times  j < i in  step  R6. 
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By  Kirchhoff’s  law,  S — A — G — K — L — R , so  the  total  running  time  comes  to 
27 A + 8£?  + 8 C — 22>G  — 14 K — 17 L — 191?  — X + 13  units.  The  bit-inspection 
loops  can  be  made  somewhat  faster,  as  shown  in  exercise  34,  at  the  expense  of 
a more  complicated  program.  It  is  also  possible  to  increase  the  speed  of  radix 
exchange  by  using  straight  insertion  whenever  r — l is  sufficiently  small,  as  we 
did  in  Algorithm  Q;  but  we  shall  not  dwell  on  these  refinements. 

In  order  to  analyze  the  running  time  of  radix  exchange,  two  kinds  of  input 
data  suggest  themselves.  We  can 

i)  assume  that  N = 2m  and  that  the  keys  to  be  sorted  are  simply  the  integers 
0,  1,  2,  . . . , 2m  — 1 in  random  order;  or 

ii)  assume  that  m — oo  (unlimited  precision)  and  that  the  keys  to  be  sorted 
are  independent  uniformly  distributed  real  numbers  in  [0 . . 1). 

The  analysis  of  case  (i)  is  relatively  easy,  so  it  has  been  left  as  an  exercise 
for  the  reader  (see  exercise  35).  Case  (ii)  is  comparatively  difficult,  so  it  has 
also  been  left  as  an  exercise  (see  exercise  38).  The  following  table  shows  crude 
approximations  to  the  results  of  these  analyses: 


Quantity 

Case  (i) 

Case  (ii) 

A 

N 

aN 

B 

\NlgN 

\NlgN 

C 

NlgN 

NlgN 

G 

-N 

21 

0 

K 

0 

I N 

21V 

L 

0 

±{a-l)N 

R 

0 

f(a-  1)N 

S 

-N 

2iv 

-N 

X 

I N 

2IV 

-N 

2iV 

Here  a = l/ln2  ss  1.4427.  Notice  that  the  average  number  of  exchanges,  bit 
inspections,  and  stack  accesses  is  essentially  the  same  for  both  kinds  of  data, 
even  though  case  (ii)  takes  about  44  percent  more  stages.  Our  MIX  program 
takes  approximately  14.4  IV  In  N units  of  time,  on  the  average,  to  sort  N items 
in  case  (ii),  and  this  could  be  cut  to  about  11.5  N In  N using  the  suggestion  of 
exercise  34;  the  corresponding  figure  for  Program  Q is  11.7  N In  N,  which  can  be 
decreased  to  about  10.6  N In  iV  using  Singleton’s  median-of-three  suggestion. 

Thus  radix  exchange  sorting  takes  about  as  long  as  quicksort,  on  the  average, 
when  sorting  uniformly  distributed  data;  on  some  machines  it  is  actually  a little 
quicker  than  quicksort.  Exercise  53  indicates  to  what  extent  the  process  slows 
down  for  a nonuniform  distribution.  It  is  important  to  note  that  our  entire 
analysis  is  predicated  on  the  assumption  that  keys  are  distinct;  radix  exchange 
as  defined  above  is  not  especially  efficient  when  equal  keys  are  present,  since  it 
goes  through  several  time-consuming  stages  trying  to  separate  sets  of  identical 
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keys  before  b becomes  > m.  One  plausible  way  to  remedy  this  defect  is  suggested 
in  the  answer  to  exercise  40. 

Both  radix  exchange  and  quicksort  are  essentially  based  on  the  idea  of 
partitioning.  Records  are  exchanged  until  the  file  is  split  into  two  parts:  a left- 
hand  subfile,  in  which  all  keys  are  < K.  for  some  K , and  a right-hand  subfile 
in  which  all  keys  are  > K . Quicksort  chooses  K to  be  an  actual  key  in  the 
file,  while  radix  exchange  essentially  chooses  an  artificial  key  K based  on  binary 
representations.  From  a historical  standpoint,  radix  exchange  was  discovered  by 
P.  Hildebrandt,  H.  Isbitz,  H.  Rising,  and  J.  Schwartz  [JACM  6 (1959),  156-163], 
about  a year  earlier  than  quicksort.  Other  partitioning  schemes  are  also  possible; 
for  example,  John  McCarthy  has  suggested  setting  K « |(tt  + v),  if  all  keys  are 
known  to  lie  between  u and  v.  Yihsiao  Wang  has  suggested  that  the  mean  of 
three  key  values  such  as  (28)  be  used  as  the  threshold  for  partitioning;  he  has 
proved  that  the  number  of  comparisons  required  to  sort  uniformly  distributed 
random  data  will  then  be  asymptotic  to  1.082  NlgN. 

Still  another  partitioning  strategy  has  been  proposed  by  M.  H.  van  Emden 
[CACM  13  (1970),  563-567]:  Instead  of  choosing  K in  advance,  we  “learn” 
what  a good  K might  be,  by  keeping  track  of  K'  = ma x(AT/, . . . , Ki)  and  K"  — 
inin(  Kj , . . . , Kr)  as  partitioning  proceeds.  We  may  increase  1 until  encountering 
a key  greater  than  K1,  then  decrease  j until  encountering  a key  less  than  K ",  then 
exchange  and/or  adjust  K'  and  K".  Empirical  tests  on  this  “interval-exchange 
sort”  method  indicate  that  it  is  slightly  slower  than  quicksort;  its  running  time 
appears  to  be  so  difficult  to  analyze  that  an  adequate  theoretical  explanation 
will  never  be  found,  especially  since  the  subfiles  after  partitioning  are  no  longer 
in  random  order. 

A generalization  of  radix  exchange  to  radices  higher  than  2 is  discussed  in 
Section  5.2.5. 

* Asymptotic  methods.  The  analysis  of  exchange  sort  ing  algorithms  leads  to 
some  particularly  instructive  mathematical  problems  that  enable  us  to  learn 
more  about  how  to  find  the  asymptotic  behavior  of  functions.  For  example,  we 
came  across  the  function 

= ± £ «!rn-  (31) 

0 <r<s<n 

in  (9),  during  our  analysis  of  the  bubble  sort;  what  is  its  asymptotic  value? 

We  can  proceed  as  in  our  study  of  the  number  of  involutions,  Eq.  5.1.4-(4i); 
the  reader  will  find  it  helpful  to  review  the  discussion  at  the  end  of  Section  5.1.4 
before  reading  further. 

Inspection  of  (31)  shows  that  the  contribution  for  s = n is  larger  than  that 
for  s = n — 1,  etc.;  this  suggests  replacing  s by  n - ,s.  In  fact,  we  soon  discover 
that  it  is  most  convenient  to  use  the  substitutions  t = n — s + l,m  = n + l,  so 
that  (31)  becomes 

— Wm-i  = “ £ (m  — 1)\  £ rt_x.  (32) 

m m\  ^ 

l<t<m  0 <r<m—t 
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The  inner  sum  has  a well-known  asymptotic  series  obtained  from  Euler’s  sum- 
mation formula,  namely 

E r<_1  = T~\ {Nt _1  - 5n)  + If  (*  - 1)(iVt"2  - s«)  + ■■■ 

0<r<N 

1 k 

= ~t  E ( ? ) B^Nt~3  ~ *ti)  + 0{Nt~k)  (33) 

3=0  J 

(see  exercise  1.2.11.2-4);  hence  our  problem  reduces  to  studying  sums  of  the 
form 

~j  E - ty.  (m  - t)Hk , k > — 1.  (34) 

1 < £ < m 

As  in  Section  5.1.4  we  can  show  that  the  value  of  this  summand  is  negligi- 
ble, 0(exp(-n5)),  whenever  t is  greater  than  m1/2+e;  hence  we  may  put  t = 
0(to1/2+c)  and  replace  the  factorials  by  Stirling’s  approximation: 


(m  - t)!  (m  — ty 

ml 


We  are  therefore  interested  in  the  asymptotic  value  of 

n(m)  = e~t2/2mtk,  k>- 1.  (35) 

1 <t<m 

The  sum  could  also  be  extended  to  the  full  range  1 < t < 00  without  changing 
its  asymptotic  value,  since  the  values  for  t > m1/2+e  are  negligible. 

Let  gk(x)  = xke~x2  and  fk(x)  = gk(x/s/2m).  When  k > 0,  Euler’s 
summation  formula  tells  us  that 

rm  p R 

E /*«=/  AW^+E|(/r)H-/lH1(o))+^ 

0 <t<m  j = 1 

r Bp({x})fip\x)dx 
P • Jo 

= (7b)  0 (/  (36) 

hence  we  can  get  an  asymptotic  series  for  r*,(m)  whenever  A:  > 0 by  using 
essentially  the  same  ideas  we  have  used  before.  But  when  k — -1  the  method 
breaks  down,  since  /-i(0)  is  undefined;  we  can’t  merely  sum  from  1 to  m either, 
because  the  remainders  don’t  give  smaller  and  smaller  powers  of  m when  the 
lower  limit  is  1.  (This  is  the  crux  of  the  matter,  and  the  reader  should  pause  to 
appreciate  the  problem  before  proceeding  further.) 
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To  resolve  the  dilemma  we  can  define  g_ t(x)  = (e~x2  - l)/x  and  /_  1(x)  = 
9- 1 {x/%/2 m);  then  /_ i(0)  = 0,  and  r_i(ra)  can  be  obtained  from  52o<t<m  /- i(t) 
in  a simple  way.  Equation  (36)  is  now  valid  for  k — — 1,  and  the  remaining 
integral  is  well  known, 


= -7  - lnm  + ln2  + 0(e  m/2), 


by  exercise  43. 

Now  we  have  enough  facts  and  formulas  to  grind  out  the  answer, 

Wn  = In  to  + i (7  + In  2)m  — |\/27rm+  §§  +0{nr1/2),  m = n + 1,  (37) 

as  shown  in  exercise  44.  This  completes  our  analysis  of  the  bubble  sort. 

For  the  analysis  of  radix  exchange  sorting,  we  need  to  know  the  asymptotic 
value  of  the  finite  sum 

v-  = E(nt)<-‘)t¥^rj  m 

as  n — >-  00.  This  question  turns  out  to  be  harder  than  any  of  the  other  asymptotic 
problems  we  have  met  so  far;  the  elementary  methods  of  power  series  expansions. 
Euler’s  summation  formula,  etc.,  turn  out  to  be  inadequate.  The  following 
derivation  has  been  suggested  by  N.  G.  de  Bruijn. 

To  get  rid  of  the  cancellation  effects  of  the  large  factors  (fc)(-l)fe  in  (38), 
we  start  by  rewriting  the  sum  as  an  infinite  series 

U"  = E(fc)  (-D*£  (^n)J  = E (2J'(!  - 2 -jr  - 2 j+n).  (39) 

k>  2 j>  1 j>  1 

If  we  set  x = n/2J,  the  summand  is 

2*(l-2-'')"-2*+n  = ^ + . 

When  x < ne,  we  have 

(1_^)  =exp(nln(1“  ^))  =exp(-x  + x20(n~1)),  (40) 

and  this  suggests  approximating  (39)  by 

Tn  = J2(2je-n/2i  ~2j  +n).  (41) 

i>  1 


5.2.2 


SORTING  BY  EXCHANGING  131 


To  justify  this  approximation,  we  have  Un  — Tn  = Xn  + Yn,  where 


Xn  = ]T  (2j(l-2~j)n  -Ve-n/2j) 


J>i 

23  <n1 


= E o( 


-n/23 


ne  ) 


[the  terms  for  x > ne ] 


[since  0 < 1 — 2 ? < e 


i>  i 

2J<n1~e 

0(n  log  ne_n  ) 


[since  there  are  O(logn)  terms]; 


and 


Yn  = (2^(1  — 2 J)n  — 2Je  n/2J[)  [the  terms  for  x < ne] 


i>i 

23>n1~ 


= E (e  [by  (40)]. 

23>n1~e 


Our  discussion  below  will  demonstrate  that  the  latter  sum  is  0(1);  consequently 
Un  — Tn  — 0(1).  (See  exercise  47.) 

So  far  we  haven’t  applied  any  techniques  that  are  really  different  from  those 
we  have  used  before.  But  the  study  of  Tn  requires  a new  idea,  based  on  simple 
principles  of  complex  variable  theory:  If  x is  any  positive  number,  we  have 

1 pl/2-\-ioo  1 poo 

e~x  = — — : / Y(z)x~z  dz  = — / T(i  + dt.  (42) 

Znl  Jl/2-ioo  J - 00 

To  prove  this  identity,  consider  the  path  of  integration  shown  in  Fig.  20(a),  where 
N,  N\  and  M are  large.  The  value  of  the  integral  along  this  contour  is  the  sum 
of  the  residues  inside,  namely 


E x ^ ^ bm  (z  + k)Y(z) 
(X  k<M 


E xk 

0 <k<M 


k\ 


The  integral  on  the  top  line  is  0(f E |r(t  + iN)\x  4 dt),  and  we  have  the  well- 
known  bound 


Y(t  + iN)  = 0(\t  + tW|t-1/2e_*_,rAr/2)  as  N ->  00. 

[For  properties  of  the  gamma  function  see,  for  example,  Erdelyi,  Magnus,  Ober- 
hettinger,  and  Tricomi,  Higher  Transcendental  Functions  1 (New  York:  McGraw- 
Hill,  1953),  Chapter  1.]  Therefore  the  top  line  integral  is  quite  negligible, 
0(e-7rA /2  f*^(N /xe)4  dt).  The  bottom  line  integral  has  a similar  innocuous 
behavior.  For  the  integral  along  the  left  line  we  use  the  fact  that 

r(|  -\-it-  m)  = r(|  + + i + i*)...(-i  + i + it) 

~ F(|  + it)0(l/(M  — 1)!); 
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(a)  (b) 

Fig.  20.  Contours  of  integration  for  gamma-function  identities. 


hence  the  left-hand  integral  is  0(a;M-1/2/(M-l)!)  |r(|  + it)  | dt.  Therefore 

as  M,  N,  N'  — t oo,  only  the  right-hand  integral  survives,  and  this  proves  (42). 
In  fact,  (42)  remains  valid  if  we  replace  i by  any  positive  number. 

The  same  argument  can  be  used  to  derive  many  other  useful  relations 
involving  the  gamma  function.  We  can  replace  x~z  by  other  functions  of  2; 
or  we  can  replace  the  constant  | by  other  quantities.  For  example, 


1 

27T  i 


L 


-3/2+zo 


3/2  — zoo 


r (z)x  z dz  = e x - 1 + X, 


and  this  is  the  critical  quantity  in  our  formula  (41)  for  T„  : 


Tr,  = 


iE  — f 

h 270  J 


— 3/2+ioo 


3/2— zoo 


T{z)(n/V)-1~z  dz. 


(43) 


(44) 


The  sum  may  be  placed  inside  the  integrals,  since  its  convergence  is  absolutely 
well-behaved;  we  have 


E(n/ ^)W  nw's^2l(l/2wy  — nw/(2w  — 1),  when  5ft (w)  > 0, 

j>i  j> 1 

because  |2"’|  = 2^)  > 1.  Therefore 


T — 

-Ln  — 


— / 

2m  J_ 


— 3/2+zoo 


FWi 


3/2— zoo 


-1-z 


- 1 


dz, 


(45) 


and  it  remains  to  evaluate  the  latter  integral. 

This  time  we  integrate  along  a path  that  extends  far  to  the  right,  as  in 
Fig.  20(b).  The  top  line  integral  is  0(n1/2e_7rJV/2  f™/2  \M  + iN\l  dt),  if  2iN  ^ 1. 
and  the  bottom  line  integral  is  equally  negligible,  when  N and  N'  are  much 
larger  than  M.  The  right-hand  line  integral  is  \T(M  + it)\  dt). 

Fixing  M and  letting  N,  N'  00  shows  that  -Tn/n  is  0(n“1-M)  plus  the  sum 
of  the  residues  in  the  region  —3/2  < 5ft(z)  < M.  The  factor  I'  (c ) has  simple  poles 
at  2 = — 1 and  z = 0,  while  n~1~z  has  no  poles,  and  l/(2~1~z  - 1)  has  simple 
poles  when  z = -1  + 27rifc/ln2. 
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The  double  pole  at  z = —1  is  the  hardest  to  handle.  We  can  use  the  well- 
known  relation 

T (z  + 1)  = exp(— 7Z  + C(2)z3/2  - C(3)*3/3  + C(4)*4/4  -**•)> 

where  £(s)  = l_s  + 2~s  + 3“s  + • • • = to  deduce  the  following  expansions 
when  w = z + 1 is  small: 

= T(w  + ^ = -w~l  + (7  - 1)  + 0{w), 
w(w  — 1) 

n~l~z  = 1 — w Inn  + 0(w2), 


1/(2  1 2 - 1)  = —w  x/ln2  - \ + O(w). 


The  residue  at  2 = -1  is  the  coefficient  of  in-1  in  the  product  of  these  three 
formulas,  namely  | - (Inn  + 7 - l)/ln2.  Adding  the  other  residues  gives  the 
formula 


T, 


In  n + 7 


„±+Hn)  + l + 0(n-M), 

n In  2 2 n 

for  arbitrarily  large  M,  where  5(n)  is  a rather  strange  function, 


(46) 


j(»)  = 575  E 5R(T(  — 1 — 2nik/\n2)  exp(2niklgn)) . (47) 

n/  fc>  1 


Notice  that  S(n)  = S(2n).  The  average  value  of  S(n)  is  zero,  since  the  average 
value  of  each  term  is  zero.  (We  may  assume  that  (lgn)  modi  is  uniformly 
distributed,  in  view  of  the  results  about  floating  point  numbers  in  Section  4.2.4.) 
Furthermore,  since  |T(-1  + it) | = |7r/(t(l  + t2)  sinh7rt)|1/2,  it  is  not  difficult  to 
show  that 

1 8{n)  | < 0.000000173;  (48) 

thus  we  may  safely  ignore  S(n)  for  practical  purposes.  For  theoretical  purposes, 
however,  we  can’t  obtain  a valid  asymptotic  expansion  of  Un  without  it;  that  is 
why  Un  is  a comparatively  difficult  function  to  analyze. 

From  the  definition  of  Tn  in  (41)  we  can  see  immediately  that 


Tgn  _ Tn 
2 n n 


1 e~n 

1 • 

n n 


(49) 


Therefore  the  error  term  0(n~M)  in  (46)  is  essential;  it  cannot  be  replaced  by 
zero.  However,  exercise  54  presents  another  approach  to  the  analysis,  which 
avoids  such  error  terms  by  deriving  a rather  peculiar  convergent  series. 

In  summary,  we  have  deduced  the  behavior  of  the  difficult  sum  (38): 

I/n  =nlgn  + n ^ +d(n)^  +0(1).  (50) 


The  gamma-function  method  we  have  used  to  obtain  this  result  is  a special  case 
of  the  general  technique  of  Mellin  transforms , which  are  extremely  useful  in  the 
study  of  radix-oriented  recurrence  relations.  Other  examples  of  this  approach 
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can  be  found  in  exercises  51-53  and  in  Section  6.3.  An  excellent  introduction 
to  Mellin  transforms  and  their  applications  to  algorithmic  analysis  has  been 
presented  by  P.  Flajolet,  X.  Gourdon,  and  P.  Dumas  in  Theoretical  Computer 
Science  144  (1995),  3-58. 

EXERCISES 

1.  [M‘20]  Let  a\ ...  an  be  a permutation  of  {1, . . . , n),  and  let  i and  j be  indices  such 
that  i < j and  ax  > aj . Let  a[ . . . a'n  be  the  permutation  obtained  from  ax . . . an  by 
interchanging  a,  and  aj.  Can  a] . . . a'n  have  more  inversions  than  a\. . . an? 

► 2.  [M25]  (a)  What  is  the  minimum  number  of  exchanges  that  will  sort  the  permuta- 
tion 37698145  2?  (b)  In  general,  given  any  permutation  n = ax ...  an  of  {1, . . . , n}, 
let  xch(7r)  be  the  minimum  number  of  exchanges  that  will  sort  7r  into  increasing  order. 
Express  xch(7r)  in  terms  of  “simpler”  characteristics  of  7r.  (See  exercise  5.1.4-41  for 
another  way  to  measure  the  disorder  of  a permutation.) 

3.  [10]  Is  the  bubble  sort  Algorithm  B a stable  sorting  algorithm? 

4.  [M23]  If  t = 1 in  step  B4,  we  could  actually  terminate  Algorithm  B immediately, 
because  the  subsequent  step  B2  will  do  nothing  useful.  What  is  the  probability  that 
t = 1 will  occur  in  step  B4  when  sorting  a random  permutation? 

5.  [M25]  Let  bi  62  • • • b„  be  the  inversion  table  for  the  permutation  ax  02  ■ . . an.  Show 
that  the  value  of  BOUND  after  r passes  of  the  bubble  sort  is  max  {bi  + i | bt  > r}  — r,  for 
0 < r < max  (61, ... , bn). 

6.  [M22]  Let  ax...an  be  a permutation  of  {l,...,n}  and  let  a[  . . . a'n  be  its  in- 
verse. Show  that  the  number  of  passes  to  bubble-sort  ax-..an  is  l + max(o'1  — 1, 
a2  - 2,...,a'n  - n). 

7.  [ M28 ] Calculate  the  standard  deviation  of  the  number  of  passes  for  the  bubble 
sort,  and  express  it  in  terms  of  n and  the  function  P(n).  [See  Eqs.  (6)  and  (7).] 

8.  [M24]  Derive  Eq.  (8). 

9.  [M43]  Analyze  the  number  of  passes  and  the  number  of  comparisons  in  the  cock- 
tail-shaker sorting  algorithm.  Note:  See  exercise  5. 4. 8-9  for  partial  information. 

10.  [M26]  Let  ax  a2  ■ ■ ■ an  be  a 2-ordered  permutation  of  {1,2,...,  11} . 

a)  What  are  the  coordinates  of  the  endpoints  of  the  dith  step  of  the  corresponding 
lattice  path?  (See  Fig.  11  on  page  87.) 

b)  Prove  that  the  comparison/exchange  of  a 1 : 0,2 , <13  : <7,4 , ...  corresponds  to  folding 
the  path  about  the  diagonal,  as  in  Fig.  18(b). 

c)  Prove  that  the  comparison/exchange  of  a2  : a2+d,  0.4  : ax+d-  ...  corresponds  to 
folding  the  path  about  a line  m units  below  the  diagonal,  as  in  Figs.  18(c),  (d), 
and  (e),  when  d = 2 m — 1. 

► 11.  [M25]  What  permutation  of  {1,2,  ...,16}  maximizes  the  number  of  exchanges 
done  by  Batcher’s  algorithm? 

12.  [24]  Write  a MIX  program  for  Algorithm  M,  assuming  that  MIX  is  a binary  com- 
puter with  the  operations  AND,  SRB.  How  much  time  does  your  program  take  to  sort 
the  sixteen  records  in  Table  1? 

13.  [10]  Is  Batcher’s  method  a stable  sorting  algorithm? 
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14.  [M21]  Let  c(N)  be  the  number  of  key  comparisons  used  to  sort  N elements  by 
Batcher’s  method;  this  is  the  number  of  times  step  M4  is  performed. 

a)  Show  that  c(2*)  = 2c(2f_1)  + (t  - l)2f_1  + 1,  for  t > 1. 

b)  Find  a simple  expression  for  c(2*)  as  a function  of  t.  Hint:  Consider  the  sequence 
xt=c(  2t)/2t. 

15.  [ M38 ] The  object  of  this  exercise  is  to  analyze  the  function  c(N)  of  exercise  14, 

and  to  find  a formula  for  c(N)  when  N = 2ei  + 262  H f 2Et',  ei  > e2  > • • • > er  > 0. 

a)  Let  a(N)  = c(iV+l)  -c(N).  Prove  that  a(2n)  = a(n)+  [lg(2n)J,  and  a(2n  + l)  = 
a(n)  + 1;  hence 

a(N)  = ^ j j - r(ei  — 1)  + (ei  + e-i  + • • • + er). 

b)  Let  x (n)  = a(n)  - a( \n/2\ ),  so  that  a(n)  = x(n)  + x(  [n/2j ) + x( |n/4J ) + • • • . Let 

y(n)  = ®(l)  + *(2)d fa :(n);  and  let  z(2n)  = y(2n)-a(n),  z(2n  + l)  = y(2n  + l). 

Prove  that  c(N  + 1)  = z(N)  + 2z(  |_1V/2J ) + 4 z(  [2V/4J ) H . 

c)  Prove  that  y(N)  = N + (\_N/2\  + l)(cj  - 1)  - 2ei  + 2. 

d)  Now  put  everything  together  and  find  a formula  for  c(N)  in  terms  of  the  exponents 
ej,  holding  r fixed. 

16.  Find  the  asymptotic  value  of  the  average  number  of  exchanges  occurring 
when  Batcher’s  method  is  applied  to  a random  permutation  of  N distinct  elements, 
assuming  that  N is  a power  of  two. 

► 17.  [20]  Where  in  Algorithm  Q do  we  use  the  fact  that  K0  and  KN+1  have  the  values 
postulated  in  (13)? 

► 18.  [20]  Explain  how  the  computation  proceeds  in  Algorithm  Q when  all  of  the  input 
keys  are  equal.  What  would  happen  if  the  u<”  signs  in  steps  Q3  and  Q4  were  changed 
to  “<”  instead? 

19.  [15]  Would  Algorithm  Q still  work  properly  if  a queue  (first-in-first-out)  were 
used  instead  of  a stack  (last-in-first-out)? 

20.  [M20]  What  is  the  largest  possible  number  of  elements  that  will  ever  be  on  the 
stack  at  once  in  Algorithm  Q,  as  a function  of  M and  N? 

21.  [20]  Explain  why  the  first  partitioning  phase  of  Algorithm  Q takes  the  number  of 
comparisons  and  exchanges  specified  in  (17),  when  the  keys  are  distinct. 

22.  [M25]  Let  PkN  be  the  probability  that  the  quantity  A in  (16)  will  equal  k.  when 
Algorithm  Q is  applied  to  a random  permutation  of  {1,2,...,  A?},  and  let  Ajv(z)  = 
J2k  PkNZk  be  the  corresponding  generating  function.  Prove  that  AN(z)  = 1 for  N < M, 
and  An(z)  = z(5Zi<s<jv  As~i(z)An-s(z))/N  for  N > M.  Find  similar  recurrence 
relations  defining  the  other  probability  distributions  BN(z),  CN(z ),  DN(z),  EN(z), 
Sn(z). 

23.  [MSS]  Let  Ajv,  Bn,  Dn,  En,  Sn  be  the  average  values  of  the  corresponding 
quantities  in  (16),  when  sorting  a random  permutation  of  (1,2,..., TV } . Find  recur- 
rence relations  for  these  quantities,  analogous  to  (18);  and  solve  these  recurrences  to 
obtain  (25). 

24.  [ M21  ] Algorithm  Q obviously  does  a few  more  comparisons  than  it  needs  to,  since 
we  can  have  i = j in  step  Q3  and  even  i > j in  step  Q4.  How  many  comparisons  Cn 
would  be  done  on  the  average  if  we  avoided  all  comparisons  when  i > j? 
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25.  [M20]  When  the  input  keys  are  the  numbers  1 2 ...  IV  in  order,  what  are  the  exact 
values  of  the  quantities  A,  B,  C,  D,  E,  and  S in  the  timing  of  Program  Q?  (Assume 
that  N > M.) 

► 26.  [M24]  Construct  an  input  file  that  makes  Program  Q go  even  more  slowly  than 
it  does  in  exercise  25.  (Try  to  find  a really  bad  case.) 

27.  [M28]  (R.  Sedgewick.)  Consider  the  best  case  of  Algorithm  Q : Find  a permutation 
of  {1,2,...,  23}  that  takes  the  least  time  to  be  sorted  when  N = 23  and  M = 3. 

28.  [ M26 ] Find  the  recurrence  relation  analogous  to  (20)  that  is  satisfied  by  the 
average  number  of  comparisons  in  Singleton’s  modification  of  Algorithm  Q (choosing 
s as  the  median  of  {A'i , Aq(jv+i)/2j  > A;v}  instead  of  s — K\).  Ignore  the  comparisons 
made  when  computing  the  median  value  s. 

29.  [HM40]  Continuing  exercise  28,  find  the  asymptotic  value  of  the  number  of  com- 
parisons in  Singleton’s  “median  of  three”  method. 

► 30.  [ 25]  (P.  Shackleton.)  When  multiword  keys  are  being  sorted,  many  sorting  meth- 
ods become  progressively  slower  as  the  file  gets  closer  to  its  final  order,  since  equal 
and  nearly-equal  keys  require  an  inspection  of  several  words  to  determine  the  proper 
lexicographic  order.  (See  exercise  5-5.)  Files  that  arise  in  practice  often  involve  such 
keys,  so  this  phenomenon  can  have  a significant  impact  on  the  sorting  time. 

Explain  how  Algorithm  Q can  be  extended  to  avoid  this  difficulty;  within  a subfile 
in  which  the  leading  k words  are  known  to  have  constant  values  for  all  keys,  only  the 
( k + l)st  words  of  the  keys  should  be  inspected. 

► 31.  [20]  (C.  A.  R.  Hoare.)  Suppose  that,  instead  of  sorting  an  entire  file,  we  only 
want  to  determine  the  mt.h  smallest  of  a given  set  of  n elements.  Show  that  quicksort 
can  be  adapted  to  this  purpose,  avoiding  many  of  the  computations  required  to  do  a 
complete  sort. 

32.  [M40]  Find  a simple  closed  form  expression  for  C„m,  the  average  number  of  key 
comparisons  required  to  select  the  mth  smallest  of  n elements  by  the  “quickfind” 
method  of  exercise  31.  (For  simplicity,  let  M = 1;  that  is,  don’t  assume  the  use  of 
a special  technique  for  short  subfiles.)  What  is  the  asymptotic  behavior  of  C(2m-i)m, 
the  average  number  of  comparisons  needed  to  find  the  median  of  2m  - 1 elements  by 
Hoare’s  method? 

► 33.  [15]  Design  an  algorithm  that  rearranges  all  the  numbers  in  a given  table  so 
that  all  negative  values  precede  all  nonnegative  ones.  (The  items  need  not  be  sorted 
completely,  just  separated  between  negative  and  nonnegative.)  Your  algorithm  should 
use  the  minimum  possible  number  of  exchanges. 

34.  [20]  How  can  the  bit-inspection  loops  of  radix  exchange  (in  steps  R3  through  R6) 
be  speeded  up? 

35.  [M23]  Analyze  the  values  of  the  frequencies  A,  B.  C,  G,  A,  L,  R.  S,  and  X that 
arise  in  radix  exchange  sorting  using  “case  (i)  input.” 

36.  [M2 7]  Given  a sequence  of  numbers  (an)  = a0,  alt  o2, . . . , define  its  binomial 
transform  {a„)  = ao,di,a2, ...  by  the  rule 

a)  Prove  that  ( a„ ) = (a„). 

b)  Find  the  binomial  transforms  of  the  sequences  (1);  (n);  {(")),  for  fixed  m;  ( an ), 
for  fixed  a;  ((^)an),  for  fixed  a and  m. 
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c)  Suppose  that  a sequence  ( xn ) satisfies  the  relation 


Xn  — Ojn  ~t“  2 


'■•EG) 


Xk , 


for  n > 2; 


xo  = = no  = ai 


0. 


Prove  that  the  solution  to  this  recurrence  is 

— £ (I)  — +£  (;)(-.)* 

k>2  k> 2 

37.  [AF28]  Determine  all  sequences  (an)  such  that  (a„)  = (an),  in  the  sense  of  exer- 
cise 36. 

► 38.  [M30]  Find  An,  Bn,  Cn,  Gn,  Kn,  Ln,  Rn,  and  Xn,  the  average  values  of  the 
quantities  in  (29),  when  radix  exchange  is  applied  to  “case  (ii)  input.”  Express  your 
answers  in  terms  of  N and  the  quantities 


Un 


£(;) 


(-ii* 
2k~1  - 1 


Vn 


£(I) 


(-p*fc 

2k~ 1 - 1 


— Tl(Un  Un-!). 


[Hint:  See  exercise  36.] 

39.  [20]  The  results  shown  in  (30)  indicate  that  radix  exchange  sorting  involves  about 
1.441V  partitioning  stages  when  it  is  applied  to  random  input.  Prove  that  quicksort 
will  never  require  more  than  N stages;  and  explain  why  radix  exchange  often  does. 

40.  [21]  Explain  how  to  modify  Algorithm  R so  that  it  works  with  reasonable  effi- 
ciency when  sorting  files  containing  numerous  equal  keys. 

► 41.  [30]  Devise  a good  way  to  exchange  records  Ri  ...  Rr  so  that  they  are  partitioned 
into  three  blocks,  with  (i)  Kk  < K for  l < k < i;  (ii)  Kk  = K for  i < k < j;  (iii) 
Kk  > K for  j < k < r.  Schematically,  the  final  arrangement  should  be 


< K 


= K 


> K 


42.  [HM32]  For  any  real  number  c > 0,  prove  that  the  probability  is  less  than  e~c 
that  Algorithm  Q will  make  more  than  (c  + 1)(1V  + 1 )Hn  comparisons  when  sorting 
random  data.  (This  upper  bound  is  especially  interesting  when  c is,  say,  Ne .) 

43.  [HM21]  Prove  that  f*  y~1(e~y  — 1 ) dy  + y~1e~y  dy  = —7.  [Hint:  Consider 
lima_>o+  ya_1-] 

44.  [HM24]  Derive  (37)  as  suggested  in  the  text. 

45.  [HM20]  Explain  why  (43)  is  true,  when  x > 0. 

46.  [HM20]  What  is  the  value  of  (l/27ri)  T(z)ns~z dz/(23~z  — 1),  given  that 

s is  a positive  integer  and  0 < a < s? 

47.  [HM21]  Prove  that  '}ZJ>1(ji/2i)  e~n^2J  is  a bounded  function  of  n. 

48.  [HM24]  Find  the  asymptotic  value  of  the  quantity  Vn  defined  in  exercise  38,  using 
a method  analogous  to  the  text’s  study  of  Un , obtaining  terms  up  to  0(1). 

49.  [HM24 ] Extend  the  asymptotic  formula  (47)  for  Un  to  0(n-1). 

50.  [HM24]  Find  the  asymptotic  value  of  the  function 
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when  m.  is  any  fixed  number  greater  than  1.  (When  m is  an  integer  greater  than  2, 
this  quantity  arises  in  the  study  of  generalizations  of  radix  exchange,  as  well  as  the 
trie  memory  search  algorithms  of  Section  6.3.) 

► 51.  [HM28]  Show  that  the  gamma-function  approach  to  asymptotic  problems  can  be 
used  instead  of  Euler’s  summation  formula  to  derive  the  asymptotic  expansion  of  the 
quantity  rk(m)  in  (35).  (This  gives  us  a uniform  method  for  studying  rk(m ) for  all  k. 
without  relying  on  tricks  such  as  the  text’s  introduction  of  g-i(x)  = ( e -*2  - \)/x.) 

52.  [HM35]  (N.  G.  de  Bruijn.)  What  is  the  asymptotic  behavior  of  the  sum 

where  d.(t.)  is  the  number  of  divisors  of  £?  (Thus,  d(l)  = 1,  d{ 2)  = d( 3)  = 2,  d(4)  = 3, 
d(5)  = 2,  etc.  This  question  arises  in  connection  with  the  analysis  of  a tree  traversal 
algorithm,  exercise  2.3.1-11.)  Find  the  value  of  S„/(2nn)  to  terms  of  0(n“1). 

53.  [HM42]  Analyze  the  average  number  of  bit  inspections  and  exchanges  done  by 
radix  exchange  when  the  input  data  consists  of  infinite-precision  binary  numbers  in 
[O'  • X)’  each  of  whose  bits  is  independently  equal  to  1 with  probability  p.  (Only  the 
case  p = - is  discussed  in  the  text;  the  methods  we  have  used  can  be  generalized  to 
arbitrary  p.)  Consider  in  particular  the  case  p = 1 /</>  = .61803 .... 

54.  [HM24]  (S.  O.  Rice.)  Show  that  Un  can  be  written 


Un  = (-1)’ 


n\ 

2tti 


dz 


1 


)c  z(z  - 1) ...  (z  - n ) 22_1  - 1 ’ 

where  C is  a skinny  closed  curve  encircling  the  points  2, 3, . . . , n.  Changing  C to  an 
arbitrarily  large  circle  centered  at  the  origin,  derive  the  convergent  series 

2 


Un  = ^n~1  1)"  _ 2 + 2 + 

In  2 2 In  2 


X(B(n  + 1, -1  + ibm)), 


m > 1 


where  b - 2tt/ In  2,  and  B(n+ 1, -1+ibm)  = T(n  + 1)T(-1  + ibm)/T(n  + ibm)  = 
«>/n*=o(*-  1 + ibm). 

► 55.  [22]  Show  how  to  modify  Program  Q so  that  the  partitioning  element  is  the 
median  of  the  three  keys  (28),  assuming  that  M > 1. 

56.  [M43]  Analyze  the  average  behavior  of  the  quantities  that  occur  in  the  running 
time  of  Algorithm  Q when  the  program  has  been  modified  to  take  the  median  of  three 
elements  as  in  exercise  55.  (See  exercise  29.) 


5.2.3.  Sorting  by  Selection 

Another  important  family  of  sorting  techniques  is  based  on  the  idea  of  repeated 
selection.  The  simplest  selection  method  is  perhaps  the  following: 

i)  Find  the  smallest  key;  transfer  the  corresponding  record  to  the  output  area; 
then  replace  the  key  by  the  value  00  (which  is  assumed  to  be  higher  than 
any  actual  key). 

ii)  Repeat  step  (i).  This  time  the  second  smallest  key  will  be  selected,  since 
the  smallest  key  has  been  replaced  by  00. 

iii)  Continue  repeating  step  (i)  until  N records  have  been  selected. 
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A selection  method  requires  all  of  the  input  items  to  be  present  before  sorting 
may  proceed,  and  it  generates  the  final  outputs  one  by  one  in  sequence.  This  is 
essentially  the  opposite  of  insertion,  where  the  inputs  are  received  sequentially 
but  we  do  not  know  any  of  the  final  outputs  until  sorting  is  completed. 

Step  (i)  involves  N—l  comparisons  each  time  a new  record  is  selected,  and  it 
also  requires  a separate  output  area  in  memory.  But  we  can  obviously  do  better: 
We  can  move  the  selected  record  into  its  proper  final  position,  by  exchanging  it 
with  the  record  currently  occupying  that  position.  Then  we  need  not  consider 
that  position  again  in  future  selections,  and  we  need  not  deal  with  infinite  keys. 
This  idea  yields  our  first  selection  sorting  algorithm. 

Algorithm  S ( Straight  selection  sort).  Records  i?i, . . . ,f?jv  are  rearranged  in 
place;  after  sorting  is  complete,  their  keys  will  be  in  order,  Kx  < ■ ■ ■ < KN. 
Sorting  is  based  on  the  method  indicated  above,  except  that  it  proves  to  be 
more  convenient  to  select  the  largest  element  first,  then  the  second  largest,  etc. 

51.  [Loop  on  j.\  Perform  steps  S2  and  S3  for  j = N,  N - 1,  ... , 2. 

52.  [Find  max(A'i, . . . , Kj).]  Search  through  keys  Kj,Kj-i, ...  ,KX  to  find  a 
maximal  one;  let  it  be  A"),  where  i is  as  large  as  possible. 

53.  [Exchange  with  Rj.}  Interchange  records  Rt  o R}.  (Now  records  Rj,...,  RN 
are  in  their  final  position.)  | 


Fig.  21.  Straight  selection  sorting. 


Table  1 shows  this  algorithm  in  action  on  our  sixteen  example  keys.  Elements 
that  are  candidates  for  the  maximum  during  the  right-to-left  search  in  step  S2 
are  shown  in  boldface  type. 

Table  1 

STRAIGHT  SELECTION  SORTING 


503  087  512  061  908  170  897  275  653  426  154  509  612  677  765  703 1 
503  087  512  061  703  170  897  275  653  426  154  509  612  677  765 | 908 
503  087  512  061  703  170  765  275  653  426  154  509  612  677 | 897  908 
503  087  512  061  703  170  677  275  653  426  154  509  612 | 765  897  908 
503  087  512  061  612  170  677  275  653  426  154  509 | 703  765  897  908 
503  087  512  061  612  170  509  275  653  426  154 | 677  703  765  897  908 

061 | 087  154  170  275  426  503  509  512  612  653  677  703  765  897  908 
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The  corresponding  MIX  program  is  quite  simple: 

Program  S ( Straight  selection  sort).  As  in  previous  programs  of  this  chapter, 
the  records  in  locations  INPUT+1  through  INPUT+N  are  sorted  in  place,  on  a full- 
word  key.  rA  = current  maximum,  rll  = j - 1,  rI2  = k (the  current  search 


position),  rI3  = i 

. Assume  that  N >2. 

01 

START 

ENT1 

N-l 

1 

SI.  Lood  on  i.  7 «—  N. 

02 

2H 

ENT2 

0,1 

N — 1 

S2.  Find  max(  K\ K , ) . 

03 

ENT3 

1,1 

N-l 

i <-  j. 

04 

LDA 

INPUT , 3 

N - 1 

rA  <-  Ki. 

05 

8H 

CMPA 

INPUT, 2 

A 

06 

JGE 

*+3 

A 

Jump  if  Ki  > Kk- 

07 

ENT3 

0,2 

B 

Otherwise  set  i <—  k, 

08 

LDA 

INPUT, 3 

B 

rA  +-  Ki. 

09 

DEC2 

1 

A 

k 4—  k — 1. 

10 

J2P 

8B 

A 

Repeat  if  k > 0. 

11 

LDX 

INPUT+1 , 1 

N-l 

S3.  Exchanse  with  R , . 

12 

STX 

INPUT, 3 

N-l 

R : i — Rj  . 

13 

STA 

INPUT+1 , 1 

N-  1 

Rj  +—  rA. 

14 

DEC1 

1 

N - 1 

15 

J1P 

2B 

N - 1 

N>j>  2.  | 

The  running  time  of  this  program  depends  on  the  number  of  items,  N;  the 
number  of  comparisons,  A;  and  the  number  of  changes  to  right-to-left  maxima,  B. 
It  is  easy  to  see  that 

-MaH"'"-1)'  W 

regardless  of  the  values  of  the  input  keys;  hence  only  B is  variable.  In  spite  of  the 
simplicity  of  straight  selection,  this  quantity  B is  not  easy  to  analyze  precisely. 
Exercises  3 through  6 show  that 

B = (min  0,  ave  (N  + 1)HN  - 2N , max  [N2/A\)\  (2) 

in  this  case  the  maximum  value  turns  out  to  be  particularly  interesting.  The 
standard  deviation  of  B is  of  order  TV3/4;  see  exercise  7. 

Thus  the  average  running  time  of  Program  S is  2.5N2  + 3 (N  + 1 )Hn  + 
3-5 N - 11  units,  just  slightly  slower  than  straight  insertion  (Program  5. 2. IS). 
It  is  interesting  to  compare  Algorithm  S to  the  bubble  sort  (Algorithm  5.2.2B), 
since  bubble  sorting  may  be  regarded  as  a selection  algorithm  that  sometimes 
selects  more  than  one  element  at  a time.  For  this  reason  bubble  sorting  usually 
does  fewer  comparisons  than  straight  selection  and  it  may  seem  to  be  preferable; 
but  in  fact  Program  5.2.2B  is  more  than  twice  as  slow  as  Program  S!  Bubble 
sorting  is  handicapped  by  the  fact  that  it  does  so  many  exchanges,  while  straight 
selection  involves  very  little  data  movement. 

Refinements  of  straight  selection.  Is  there  any  way  to  improve  on  the 
selection  method  used  in  Algorithm  S?  For  example,  take  the  search  for  a 
maximum  in  step  S2;  is  there  a substantially  faster  way  to  find  a maximum? 
The  answer  to  the  latter  question  is  no! 
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Lemma  M.  Every  algorithm  for  finding  the  maximum  of  n elements,  based  on 
comparing  pairs  of  elements,  must  make  at  least  n — 1 comparisons. 

Proof.  If  we  have  made  fewer  than  n — 1 comparisons,  there  will  be  at  least  two 
elements  that  have  never  been  found  to  be  less  than  any  others.  Therefore  we  do 
not  know  which  of  these  two  elements  is  larger,  and  we  cannot  have  determined 
the  maximum.  | 

Thus,  any  selection  process  that  finds  the  largest  element  must  perform  at 
least  n — 1 comparisons;  and  we  might  suspect  that  all  sorting  methods  based  on 
n repeated  selections  are  doomed  to  require  fl(n2)  operations.  But  fortunately 
Lemma  M applies  only  to  the  first  selection  step;  subsequent  selections  can  make 
use  of  previously  gained  information.  For  example,  exercises  8 and  9 show  that 
a comparatively  simple  change  to  Algorithm  S will  cut  the  average  number  of 
comparisons  in  half. 

Consider  the  16  numbers  in  Table  1;  one  way  to  save  time  on  repeated 
selections  is  to  regard  them  as  four  groups  of  four.  We  can  start  by  determining 
the  largest  in  each  group,  namely  the  respective  keys 

512,  908,  653,  765; 

the  largest  of  these  four  elements,  908,  is  then  the  largest  of  the  entire  file.  To 
get  the  second  largest  we  need  only  look  at  512,  653,  765,  and  the  other  three 
elements  of  the  group  containing  908;  the  largest  of  {170,  897,  275}  is  897,  and 
the  largest  of 

512,  897,  653,  765 

is  897.  Similarly,  to  get  the  third  largest  element  we  determine  the  largest  of 
{170,  275}  and  then  the  largest  of 

ji 

512,  275,  653,  765. 

Each  selection  after  the  first  takes  at  most  5 additional  comparisons.  In  general, 
if  N is  a perfect  square,  we  can  divide  the  file  into  \/N  groups  of  \ /N  elements 
each;  each  selection  after  the  first  takes  at  most  y/N  — 2 comparisons  within 
the  group  of  the  previously  selected  item,  plus  \/N  — 1 comparisons  among  the 
“group  leaders.”  This  idea  is  called  quadratic  selection;  its  total  execution  time 
is  0(NyrN),  which  is  substantially  better  than  order  N2 . 

Quadratic  selection  was  first  published  by  E.  H.  Friend  [JACM  3 (1956), 

152  154],  who  pointed  out  that  the  same  idea  can  be  generalized  to  cubic, 
quartic,  and  higher  degrees  of  selection.  For  example,  cubic  selection  divides  the 
file  into  i/N  large  groups,  each  containing  x^N  small  groups,  each  containing  '/N 
records;  the  execution  time  is  proportional  to  N \fN . If  we  carry  this  idea  to  its 
ultimate  conclusion  we  arrive  at  what  Friend  called  “nth  degree  selecting,”  based 
on  a binary  tree  structure.  This  method  has  an  execution  time  proportional  to 
N log  N ; we  shall  call  it  tree  selection. 

Tree  selection.  The  principles  of  tree  selection  sorting  are  easy  to  understand 
in  terms  of  matches  in  a typical  “knockout  tournament.”  Consider,  for  example, 
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the  results  of  the  ping-pong  contest  shown  in  Fig.  22;  at  the  bottom  level,  Kim 
beats  Sandy  and  Chris  beats  Lou,  then  in  the  next  round  Chris  beats  Kim,  etc. 


Chris 


Chris 

Pat 

Kim 

Chris 

Pat 

Robin 

Kim 

Sandy 

Chris 

Lou 

Pat 

Ray 

Dale 

Robin 

Fig.  22.  A ping-pong  tournament. 


Figure  22  shows  that  Chris  is  the  champion  of  the  eight  players,  and  8-1  = 7 
matches/comparisons  were  required  to  determine  this  fact.  Pat  is  not  necessarily 
the  second-best  player;  any  of  the  people  defeated  by  Chris,  including  the  first- 
round  loser  Lou,  might  possibly  be  second  best.  We  can  determine  the  second- 
best  player  by  having  Lou  play  Kim,  and  the  winner  of  that  match  plays  Pat: 
only  two  additional  matches  are  required  to  find  the  second-best  player,  because 
of  the  structure  we  have  remembered  from  the  earlier  games. 

In  general,  we  can  “output”  the  player  at  the  root  of  the  tree,  and  replay 
the  tournament  as  if  that  player  had  been  sick  and  unable  to  play  a good  game. 
Then  the  original  second-best  player  will  rise  to  the  root;  and  to  recalculate  the 
winners  in  the  upper  levels  of  the  tree,  only  one  path  must  be  changed.  It  follows 
that  fewer  than  [lglV]  further  comparisons  are  needed  to  select  the  second-best 
player.  The  same  procedure  will  find  the  third-best,  etc.;  hence  the  total  time  for 
such  a selection  sort  will  be  roughly  proportional  to  N log  N,  as  claimed  above. 

Figure  23  shows  tree  selection  sorting  in  action,  on  our  16  example  numbers. 
Notice  that  we  need  to  know  where  the  key  at  the  root  came  from,  in  order  to 
know  where  to  insert  the  next  “-oo”.  Therefore  each  branch  node  of  the  tree 
should  actually  contain  a pointer  or  index  specifying  the  position  of  the  relevant 
key,  instead  of  the  key  itself.  It  follows  that  we  need  memory  space  for  N input 
records,  N — 1 pointers,  and  N output  records  or  pointers  to  those  records. 
(If  the  output  goes  to  tape  or  disk,  of  course,  we  don’t  need  to  retain  the  output 
records  in  high-speed  memory.) 

The  reader  should  pause  at  this  point  and  work  exercise  10,  because  a good 
understanding  of  the  basic  principles  of  tree  selection  will  make  it  easier  to 
appreciate  the  remarkable  improvements  we  are  about  to  discuss. 

One  way  to  modify  tree  selection,  essentially  introduced  by  K.  E.  Iverson 
[A  Programming  Language  (Wiley,  1962),  223-227],  does  away  with  the  need  for 
pointers  by  “looking  ahead”  in  the  following  way:  When  the  winner  of  a match 
in  the  bottom  level  of  the  tree  is  moved  up,  the  winning  value  can  be  replaced 
immediately  by  — oo  at  the  bottom  level;  and  whenever  a winner  moves  up  from 
one  branch  to  another,  we  can  replace  the  corresponding  value  by  the  one  that 
should  eventually  move  up  into  the  vacated  place  (namely  the  larger  of  the  two 
keys  below).  Repeating  this  operation  as  often  as  possible  converts  Fig.  23(a) 
into  Fig.  24. 
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512  908  653  765 


503  512  908  897  653  509  677  765 


/\  /\  /\  /\  /\  /\  /\  /\ 

503  087  512  061  908  170  897  275  653  426  154  509  612  677  765  703 

(a)  Initial  configuration. 


512  897  653  765 


503  512  170  897  653  509  677  765 


/\  /\  /\  /\  /\  /\  /\  /\ 

503  087  512  061  -oo  170  897  275  653  426  154  509  612  677  765  703 

(b)  Key  908  is  replaced  by  — oo,  and  the  second  highest  element  moves  up  to  the  root. 


503  512  170  275  426  509  -oo  -oo 

/\  /\  /\  /\  /\  /\  /\  /\ 


503  087  512  061  -oo  170  -oo  275  -oo  426  154  509  -oo  -oo  -oo  -oo 

(c)  Configuration  after  908,  897,  765,  703,  677,  653,  and  612  have  been  output. 

Fig.  23.  An  example  of  tree  selection  sorting. 


512  275  653  703 


503  061  170  -oo  426  509  677  -oo 


/\  /\  /\  /\  /\  /\  /\  /\ 

— oo  087  — oo  -oo  — oo  -oo  — oo  — oo  — oo  — oo  154  — oo  612  — oo  — oo  -oo 


Fig.  24.  The  Peter  Principle  applied  to  sorting.  Everyone  rises  to  their  level  of 
incompetence  in  the  hierarchy. 


144  SORTING 


5.2.3 


Once  the  tree  has  been  set  up  in  this  way  we  can  proceed  to  sort  by  a “top- 
down”  method,  instead  of  the  “bottom  up”  method  of  Fig.  23:  We  output  the 
root,  then  move  up  its  largest  descendant,  then  move  up  the  latter’s  largest 
descendant,  and  so  forth.  The  process  begins  to  look  less  like  a ping-pong 
tournament  and  more  like  a corporate  system  of  promotions. 

The  reader  should  be  able  to  see  that  this  top-down  method  has  the  ad- 
vantage that  redundant  comparisons  of  -oo  with  — oo  can  be  avoided.  (The 
bottom-up  approach  finds  — oo  omnipresent  in  the  latter  stages  of  sorting,  but 
the  top-down  approach  can  stop  modifying  the  tree  during  each  stage  as  soon 
as  a — oo  has  been  stored.) 

Figures  23  and  24  are  complete  binary  trees  with  16  terminal  nodes  (see 
Section  2. 3. 4. 5),  and  it  is  convenient  to  represent  such  trees  in  consecutive 
locations  as  shown  in  Fig.  25.  Note  that  the  parent  of  node  number  k is  node 
\k/2\ , and  its  children  are  nodes  2k  and  2k + 1.  This  leads  to  another  advantage 
of  the  top-down  approach,  since  it  is  often  considerably  simpler  to  go  top-down 
from  node  k to  nodes  2k  and  2k  + 1 than  bottom-up  from  node  k to  nodes  k ® ] 
and  [k/2\.  (Here  k ® 1 stands  for  k + 1 or  k - 1,  according  as  k is  even  or  odd.) 


Fig.  25.  Sequential  storage  allocation  for  a complete  binary  tree. 


Our  examples  of  tree  selection  so  far  have  more  or  less  assumed  that  N is 
a power  of  2;  but  actually  we  can  work  with  arbitrary  N,  since  the  complete 
binary  tree  with  N terminal  nodes  is  readily  constructed  for  any  N. 

Now  we  come  to  the  crucial  question:  Can’t  we  do  the  top-down  method 
without  using  — oo  at  all?  Wouldn’t  it  be  nice  if  the  important  information  of 
Fig.  24  were  all  in  locations  1 through  16  of  the  complete  binary  tree,  without  the 
useless  “holes”  containing  — oo?  Some  reflection  shows  that  it  is  indeed  possible 
to  achieve  this  goal,  not  only  eliminating  — oo  but  also  avoiding  the  need  for  an 
auxiliary  output  area.  This  line  of  thinking  leads  us  to  an  important  sorting 
algorithm  that  was  christened  “heapsort”  by  its  discoverer  J.  W.  J.  Williams 
[CACM  7 (1964),  347-348], 

Heapsort.  Let  us  say  that  a file  of  keys  Ki,  K2,  . . . , KN  is  a heap  if 
KU/ 2J  > Kj  for  1 < LJ/2J  <j<N. 


(3) 
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Thus,  K\  > K'2,  Ki  > K:i . K 2 > K4,  etc.;  this  is  exactly  the  condition  that 
holds  in  Fig.  24,  and  it  implies  in  particular  that  the  largest  key  appears  “on  top 
of  the  heap,” 

K1=max(K1,K2,...,KN).  (4) 

If  we  can  somehow  transform  an  arbitrary  input  file  into  a heap,  we  can  sort  the 
elements  by  using  a top-down  selection  procedure  as  described  above. 

An  efficient  approach  to  heap  creation  has  been  suggested  by  R.  W.  Floyd 
[CACM  7 (1964),  701].  Let  us  assume  that  we  have  been  able  to  arrange  the  file 
so  that 

Kum  ^ Ki  for  1 < Li/2J  < j < N,  (5) 

where  l is  some  number  > 1.  (In  the  original  file  this  condition  holds  vacuously  for 
l = |A72j,  since  no  subscript  j satisfies  the  condition  [A7/2J  < [_j/2j  < j < N.) 
It  is  not  difficult  to  see  how  to  transform  the  file  so  that  the  inequalities  in  (5) 
are  extended  to  the  case  l ~ |_ j /2 j , working  entirely  in  the  subtree  whose  root 
is  node  l.  Then  we  can  decrease  l by  1,  until  condition  (3)  is  finally  achieved. 
These  ideas  of  Williams  and  Floyd  lead  to  the  following  elegant  algorithm,  which 
merits  careful  study: 

Algorithm  H ( Heapsort ).  Records  Ai,...,Ajv  are  rearranged  in  place;  after 
sorting  is  complete,  their  keys  will  be  in  order,  KL  < ■ ■ ■ < I(N.  First  we 
rearrange  the  file  so  that  it  forms  a heap,  then  we  repeatedly  remove  the  top  of 
the  heap  and  transfer  it  to  its  proper  final  position.  Assume  that  N > 2. 

HI.  [Initialize.]  Set  l «—  \_N/2\  + 1,  r 4—  N. 

H2.  [Decrease  l or  r.]  If  l > 1,  set  / 4-  l - 1,  R i?/,  K <-  Kt.  (If  l > 1,  we  are 
in  the  process  of  transforming  the  input  file  into  a heap;  on  the  other  hand 
if  l = 1,  the  keys  Ki  K-2  . . . Kr  presently  constitute  a heap.)  Otherwise  set 
R <—  Rr,  K <—  Kr,  Rr  4—  Ri,  and  r <—  r — 1;  if  this  makes  r = 1,  set 
Ri  4—  R and  terminate  the  algorithm. 

H3.  [Prepare  for  siftup.]  Set  j f-  l.  (At  this  point  we  have 

Kyk/2 j > Kk  for  l < [k/2\  < k < r;  (6) 

and  record  Rk  is  in  its  final  position  for  r < k < N.  Steps  H3-H8  are  called 
the  siftup  algorithm ; their  effect  is  equivalent  to  setting  Ri  4—  R and  then 
rearranging  A;, . . . , Rr  so  that  condition  (6)  holds  also  for  l — [k/2\.) 

H4.  [Advance  downward.]  Set  i 4—  j and  j 4-  2 j.  (In  the  following  steps  we 
have  i ~ Li/2J.)  If  j < r,  go  right  on  to  step  H5;  if  j = r,  go  to  step  H6; 
and  if  j > r , go  to  H8. 

H5.  [Find  larger  child.]  If  Kj  < AJ+i,  then  set  j i — j T 1. 

H6.  [Larger  than  AT?]  If  K > Kj , then  go  to  step  H8. 

H7.  [Move  it  up.]  Set  Ri  4—  Rj,  and  go  back  to  step  H4. 

H8.  [Store  A.]  Set  Ri  4—  A.  (This  terminates  the  siftup  algorithm  initiated  in 
step  H3.)  Return  to  step  H2.  | 
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Fig.  26.  Heapsort;  dotted  lines  enclose  the  siftup  algorithm. 

Heapsort  has  sometimes  been  described  as  the  @ algorithm,  because  of  the 
motion  of  l and  r.  The  upper  triangle  represents  the  heap-creation  phase,  when 
r — N and  l decreases  to  1;  and  the  lower  triangle  represents  the  selection  phase, 
when  l = 1 and  r decreases  to  1.  Table  2 shows  the  process  of  heapsorting  our 
sixteen  example  numbers.  (Each  line  in  that  table  shows  the  state  of  affairs  at 
the  beginning  of  step  H2,  and  brackets  indicate  the  position  of  l and  r.) 

Program  H (Heapsort).  The  records  in  locations  INPUT+1  through  INPUT+N 
are  sorted  by  Algorithm  H,  with  the  following  register  assignments:  rll  = l — 1, 
rI2  = r — 1,  rI3  = i,  rI4  = j,  rI5  = r - j,  rA  = K = R,rX  = Rj. 


01 

START 

ENT1 

N/2 

1 

HI.  Initialize.  1 +-  | JV/2 ( + 1. 

02 

ENT2 

N-l 

1 

r 4—  N. 

03 

1H 

DEC1 

1 

L7V/2J 

l<-l-  1. 

04 

LDA 

INPUT+1 , 1 

L7V/2J 

R+-  Ri,K  <-  K,. 

05 

3H 

ENT4 

1,1 

P 

H3.  Prepare  for  siftup.  i «—  l. 

06 

ENT5 

0,2 

P 

07 

DEC5 

0,1 

P 

rI5  r — j . 

08 

JMP 

4F 

P 

To  H4. 

09 

5H 

LDX 

INPUT, 4 

B + A-D 

H5.  Find  larger  child. 

10 

CMPX 

INPUT+1, 4 

B + A-  D 

11 

JGE 

6F 

B + A-D 

Jump  if  Kj  > Kj+ 1. 

12 

INC4 

1 

C 

Otherwise  set  j <—  j + 1. 

13 

DEC5 

1 

C 

U 

9H 

LDX 

INPUT, 4 

C + D 

rX  +-  Rj. 

15 

6H 

CMPA 

INPUT, 4 

B + A 

H6.  Larger  than  K? 

16 

JGE 

8F 

B + A 

To  H8  if  K > Kj. 

17 

7H 

STX 

INPUT, 3 

B 

H7.  Move  it  up.  R,  <—  Rt. 

18 

4H 

ENT3 

0,4 

B + P 

H4.  Advance  downward,  i +-  i 

19 

DEC5 

0,4 

B + P 

rI5  <—  rI5  — j. 

20 

INC4 

0,4 

B + P 

j j + j. 
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Table  2 

EXAMPLE  OF  HEAPSORT 


Ah 

K2 

Ah 

Ah 

Ah 

Ah 

Ah 

Ah 

Ah 

A10 

Ahi 

Ai2 

a13 

a14 

Ais 

Ahe 

l 

r 

503 

087 

512 

061 

908 

170 

897 

275 

[653 

426 

154 

509 

612 

677 

765 

703] 

9 

16 

503 

087 

512 

061 

908 

170 

897 

[703 

653 

426 

154 

509 

612 

677 

765 

275] 

8 

16 

503 

087 

512 

061 

908 

170 

[897 

703 

653 

426 

154 

509 

612 

677 

765 

275] 

7 

16 

503 

087 

512 

061 

908 

[612 

897 

703 

653 

426 

154 

509 

170 

677 

765 

275] 

6 

16 

503 

087 

512 

061 

[908 

612 

897 

703 

653 

426 

154 

509 

170 

677 

765 

275] 

5 

16 

503 

087 

512 

[703 

908 

612 

897 

275 

653 

426 

154 

509 

170 

677 

765 

061] 

4 

16 

503 

087 

[897 

703 

908 

612 

765 

275 

653 

426 

154 

509 

170 

677 

512 

061] 

3 

16 

503 

[908 

897 

703 

426 

612 

765 

275 

653 

087 

154 

509 

170 

677 

512 

061] 

2 

16 

[908 

703 

897 

653 

426 

612 

765 

275 

503 

087 

154 

509 

170 

677 

512 

061] 

1 

16 

[897 

703 

765 

653 

426 

612 

677 

275 

503 

087 

154 

509 

170 

061 

512] 

908 

1 

15 

[765 

703 

677 

653 

426 

612 

512 

275 

503 

087 

154 

509 

170 

061] 

897 

908 

1 

14 

[703 

653 

677 

503 

426 

612 

512 

275 

061 

087 

154 

509 

170] 

765 

897 

908 

1 

13 

[677 

653 

612 

503 

426 

509 

512 

275 

061 

087 

154 

170] 

703 

765 

897 

908 

1 

12 

[653 

503 

612 

275 

426 

509 

512 

170 

061 

087 

154] 

677 

703 

765 

897 

908 

1 

11 

[612 

503 

512 

275 

426 

509 

154 

170 

061 

087] 

653 

677 

703 

765 

897 

908 

1 

10 

[512 

503 

509 

275 

426 

087 

154 

170 

061] 

612 

653 

677 

703 

765 

897 

908 

1 

9 

[509 

503 

154 

275 

426 

087 

061 

170] 

512 

612 

653 

677 

703 

765 

897 

908 

1 

8 

[503 

426 

154 

275 

170 

087 

061] 

509 

512 

612 

653 

677 

703 

765 

897 

908 

1 

7 

[426 

275 

154 

061 

170 

087] 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908 

1 

6 

[275 

170 

154 

061 

087] 

426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908 

1 

5 

[170 

087 

154 

061] 

275 

426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908 

1 

4 

[154 

087 

061] 

170 

275 

426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908 

1 

3 

[087 

061] 

154 

170 

275 

426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908 

1 

2 

21 

J5P 

5B 

B + P 

To  H5  if  j < r. 

22 

J5Z 

9B 

P - A + D 

To  H6  if  j = r. 

23 

8H 

STA 

INPUT ,3 

P 

H8.  Store  R.  Rj  s—  R. 

24 

2H 

J1P 

IB 

P 

H2.  Decrease  l or  r. 

25 

LDA 

INPUT+ 1,2 

N - 1 

If  l = 1,  set  R s-  Rr,  K s—  Ah 

26 

LDX 

INPUT+ 1 

N — 1 

27 

STX 

INPUTS- 1,2 

N — 1 

Rr  <-  An 

28 

DEC2 

1 

N - 1 

r S—  r — 1. 

29 

J2P 

3B 

N — 1 

To  H3  if  r > 1 . 

30 

STA 

INPUTs-1 

1 

Ah  S—  R.  | 

Although  this  program  is  only  about  twice  as  long  as  Program  S,  it  is  much 
more  efficient  when  N is  large.  Its  running  time  depends  on 

P = N + [N/2\  — 2,  the  number  of  siftup  passes; 

A,  the  number  of  siftup  passes  in  which  the  key  K finally  lands 
in  an  interior  node  of  the  heap; 

B,  the  total  number  of  keys  promoted  during  siftups; 

C,  the  number  of  times  j t—  j + 1 in  step  H5;  and 

D,  the  number  of  times  j = r in  step  H4. 
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These  quantities  are  analyzed  below;  in  practice  they  show  comparatively  little 
fluctuation  about  their  average  values, 

A ss  0.3491V,  B « NlgN  - 1.871V, 

C « ±AlgA  - 0.941V,  D&\gN.  (7' 

For  example,  when  A’  = 1000,  four  experiments  on  random  input  gave,  respec- 
tively, A = 371,  351,  341,  340;  B ■=  8055,  8072,  8094,  8108;  C = 4056,  4087, 
4017,  4083;  and  D = 12,  14,  8,  13.  The  total  running  time, 

7 A + UB  + 4 C + 20A  -2 D + 15[iV/2j  - 28, 

is  therefore  approximately  16 A”  lg  A + 0.01  A units  on  the  average. 

A glance  at  Table  2 makes  it  hard  to  believe  that  heapsort  is  very  efficient: 
large  keys  migrate  to  the  left  before  we  stash  them  at  the  right!  It  is  indeed  a 
strange  way  to  sort,  when  N is  small;  the  sorting  time  for  the  16  keys  in  Table  2 
is  1068u,  while  the  simple  method  of  straight  insertion  (Program  5. 2. IS)  takes 
only  514m.  Straight  selection  (Program  S)  takes  853w. 

For  larger  A,  Program  H is  more  efficient.  It  invites  comparison  with 
shellsort  (Program  5.2. ID)  and  quicksort  (Program  5.2.2Q),  since  all  three  pro- 
grams sort  by  comparisons  of  keys  and  use  little  or  no  auxiliary  storage.  When 
N = 1000,  the  approximate  average  running  times  on  MIX  are 

160000m  for  heapsort, 

130000u  for  shellsort, 

80000m  for  quicksort. 

(MIX  is  a typical  computer,  but  particular  machines  will  of  course  yield  somewhat 
different  relative  values.)  As  A gets  larger,  heapsort  will  be  superior  to  shell- 
sort,  but  its  asymptotic  running  time  16AlgA  ss  23.08ATnA  will  never  beat 
quicksort’s  11.67 N In  N.  A modification  of  heapsort  discussed  in  exercise  18  will 
speed  up  the  process  by  substantially  reducing  the  number  of  comparisons,  but 
even  this  improvement  falls  short  of  quicksort. 

On  the  other  hand,  quicksort  is  efficient  only  on  the  average,  and  its  worst 
case  is  of  order  N 2.  Heapsort  has  the  interesting  property  that  its  worst  case 
isn’t  much  worse  than  the  average:  We  always  have 

A < 1.5 AT,  B < A|_lgA_|,  C<N[\gN\,  (8) 

so  Program  H will  take  no  more  than  18A|_lg  N\  + 38N  units  of  time,  regardless 
of  the  distribution  of  the  input  data.  Heapsort  is  the  first  sorting  method  we 
have  seen  that  is  guaranteed  to  be  of  order  A log  A.  Merge  sorting,  discussed  in 
Section  5.2.4  below,  also  has  this  property,  but  it  requires  more  memory  space. 

Largest  in,  first  out.  We  have  seen  in  Chapter  2 that  linear  lists  can  often  be 
classified  in  a meaningful  way  by  the  nature  of  the  insertion  and  deletion  oper- 
ations that  make  them  grow  and  shrink.  A stack  has  last-in-first-out  behavior, 
in  the  sense  that  every  deletion  removes  the  youngest  item  in  the  list  — the  item 
that  was  inserted  most  recently  of  all  items  currently  present.  A simple  queue 
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has  first-in-first-out  behavior,  in  the  sense  that  every  deletion  removes  the  oldest 
remaining  item.  In  more  complex  situations,  such  as  the  elevator  simulation  of 
Section  2.2.5,  we  want  a smallest-in-first-out  list,  where  every  deletion  removes 
the  item  having  the  smallest  key.  Such  a list  may  be  called  a priority  queue, 
since  the  key  of  each  item  reflects  its  relative  ability  to  get  out  of  the  list  quickly. 
Selection  sorting  is  a special  case  of  a priority  queue  in  which  we  do  N insertions 
followed  by  N deletions. 

Priority  queues  arise  in  a wide  variety  of  applications.  For  example,  some 
numerical  iterative  schemes  are  based  on  repeated  selection  of  an  item  having 
the  largest  (or  smallest)  value  of  some  test  criterion;  parameters  of  the  selected 
item  are  changed,  and  it  is  reinserted  into  the  list  with  a new  test  value,  based  on 
the  new  values  of  its  parameters.  Operating  systems  often  make  use  of  priority 
queues  for  the  scheduling  of  jobs.  Exercises  15,  29,  and  36  mention  other  typical 
applications  of  priority  queues,  and  many  other  examples  will  appear  in  later 
chapters. 

How  shall  we  implement  priority  queues?  One  of  the  obvious  methods  is 
to  maintain  a sorted  list,  containing  the  items  in  order  of  their  keys.  Inserting 
a new  item  is  then  essentially  the  same  problem  we  have  treated  in  our  study 
of  insertion  sorting,  Section  5.2.1.  Another  even  more  obvious  way  to  deal  with 
priority  queues  is  to  keep  the  list  of  elements  in  arbitrary  order,  selecting  the 
appropriate  element  each  time  a deletion  is  required  by  finding  the  largest  (or 
smallest)  key.  The  trouble  with  both  of  these  obvious  approaches  is  that  they 
require  f l(N)  steps  either  for  insertion  or  deletion,  when  there  are  N entries  in 
the  list,  so  they  are  very  time-consuming  when  N is  large. 

In  his  original  paper  on  heapsorting,  Williams  pointed  out  that  heaps  are 
ideally  suited  to  large  priority  queue  applications,  since  we  can  insert  or  delete 
elements  from  a heap  in  0(\ogN)  steps;  furthermore,  all  elements  of  the  heap 
are  compactly  located  in  consecutive  memory  locations.  The  selection  phase  of 
Algorithm  H is  a sequence  of  deletion  steps  of  a largest-in-first-out  process:  To 
delete  the  largest  element  Ki  we  remove  it  and  sift  Kn  up  into  a new  heap  of 
N — 1 elements.  (If  we  want  a smallest-in-first-out  algorithm,  as  in  the  elevator 
simulation,  we  can  obviously  change  the  definition  of  heap  so  that  “>”  becomes 
u<”  in  (3);  for  convenience,  we  shall  consider  only  the  largest-in-first-out  case 
here.)  In  general,  if  we  want  to  delete  the  largest  item  and  then  insert  a new 
element  x,  we  can  do  the  sift  up  procedure  with 

/ = 1,  r = N,  and  K = x. 

If  we  wish  to  insert  an  element  x without  a prior  deletion,  we  can  use  the  bottom- 
up  procedure  of  exercise  16. 

A linked  representation  for  priority  queues.  An  efficient  way  to  represent 
priority  queues  as  linked  binary  trees  was  discovered  in  1971  by  Clark  A.  Crane 
[Technical  Report  STAN-CS-72-259  (Computer  Science  Department,  Stanford 
University,  1972)].  His  method  requires  two  link  fields  and  a small  count  in 
every  record,  but  it  has  the  following  advantages  over  a heap: 
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i)  When  the  priority  queue  is  being  treated  as  a stack,  the  insertion  and 
deletion  operations  take  a fixed  time  independent  of  the  queue  size. 

ii)  The  records  never  move,  only  the  pointers  change. 

iii)  Two  disjoint  priority  queues,  having  a total  of  TV  elements,  can  easily  be 
merged  into  a single  priority  queue,  in  only  O(logTV)  steps. 

Crane’s  original  method,  slightly  modified,  is  illustrated  in  Fig.  27,  which 
shows  a special  kind  of  binary  tree  structure.  Each  node  contains  a KEY  field,  a 
DIST  field,  and  two  link  fields  LEFT  and  RIGHT.  The  DIST  field  is  always  set  to 
the  length  of  a shortest  path  from  that  node  to  the  null  link  A;  in  other  words, 
it  is  the  distance  from  that  node  to  the  nearest  empty  subtree.  If  we  define 
DIST  (A)  = 0 and  KEY  (A)  = — oo,  the  KEY  and  DIST  fields  in  the  tree  satisfy  the 
following  properties: 

KEY(P)  > KEY (LEFT (P) ) , KEY (P)  > KEY (RIGHT (P) ) ; (9) 

DIST(P)  = 1 + min(DIST(LEFT(P)),DIST(RIGHT(P)));  (10) 

DIST (LEFT (P) ) > DIST(RIGHTCP))  . (11) 

Relation  (9)  is  analogous  to  the  heap  condition  (3);  it  guarantees  that  the  root 
of  the  tree  has  the  largest  key.  Relation  (10)  is  just  the  definition  of  the  DIST 
fields  as  stated  above.  Relation  (11)  is  the  interesting  innovation:  It  implies  that 
a shortest  path  to  A may  always  be  obtained  by  moving  to  the  right.  We  shall 
say  that  a binary  tree  with  this  property  is  a leftist  tree,  because  it  tends  to  lean 
so  heavily  to  the  left. 

It  is  clear  from  these  definitions  that  DIST(P)  — n implies  the  existence  of 
at  least  2”  empty  subtrees  below  P;  otherwise  there  would  be  a shorter  path 
from  P to  A.  Thus,  if  there  are  TV  nodes  in  a leftist  tree,  the  path  leading 
downward  from  the  root  towards  the  right  contains  at  most  [lg(IV  + 1) J nodes. 
It  is  possible  to  insert  a new  node  into  the  priority  queue  by  traversing  this  path 
(see  exercise  33);  hence  only  O(logTV)  steps  are  needed  in  the  worst  case.  The 
best  case  occurs  when  the  tree  is  linear  (all  RIGHT  links  are  A),  and  the  worst 
case  occurs  when  the  tree  is  perfectly  balanced. 

To  remove  the  node  at  the  root,  we  simply  need  to  merge  its  two  subtrees. 
The  operation  of  merging  two  disjoint  leftist  trees,  pointed  to  respectively  by 
P and  Q,  is  conceptually  simple:  If  KEY  (P)  > KEY (Q)  we  take  P as  the  root 
and  merge  Q with  P’s  right  subtree;  then  DIST(P)  is  updated,  and  LEFT(P)  is 
interchanged  with  RIGHT (P)  if  necessary.  A detailed  description  of  this  process 
is  not  difficult  to  devise  (see  exercise  33). 

Comparison  of  priority  queue  techniques.  When  the  number  of  nodes, 
TV,  is  small,  it  is  best  to  use  one  of  the  straightforward  linear  list  methods  to 
maintain  a priority  queue;  but  when  TV  is  large,  a log  TV  method  using  heaps 
or  leftist  trees  is  obviously  much  faster.  In  Section  6.2.3  we  shall  discuss  the 
representation  of  linear  lists  as  balanced  trees,  and  this  leads  to  a third  log  TV 
method  suitable  for  priority  queue  implementation.  It  is  therefore  appropriate 
to  compare  these  three  techniques. 
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We  have  seen  that  leftist  tree  operations  tend  to  be  slightly  faster  than  heap 
operations,  although  heaps  consume  less  memory  space  because  they  have  no 
link  fields.  Balanced  trees  take  about  the  same  space  as  leftist  trees,  perhaps 
slightly  less;  the  operations  are  slower  than  heaps,  and  the  programming  is  more 
complicated,  but  the  balanced  tree  structure  is  considerably  more  flexible  in 
several  ways.  When  using  a heap  or  a leftist  tree  we  cannot  predict  very  easily 
what  will  happen  to  two  items  with  equal  keys;  it  is  impossible  to  guarantee 
that  items  with  equal  keys  will  be  treated  in  a last-in-first-out  or  first-in-first- 
out  manner,  unless  the  key  is  extended  to  include  an  additional  “serial  number  of 
insertion”  field  so  that  no  equal  keys  are  really  present.  With  balanced  trees,  on 
the  other  hand,  we  can  easily  stipulate  consistent  conventions  about  equal  keys, 
and  we  can  also  do  things  such  as  “insert  x immediately  before  (or  after)  y." 
Balanced  trees  are  symmetrical,  so  that  we  can  delete  either  the  largest  or  the 
smallest  element  at  any  time,  while  heaps  and  leftist  trees  must  be  oriented 
one  way  or  the  other.  (See  exercise  31,  however,  which  shows  how  to  construct 
symmetrical  heaps.)  Balanced  trees  can  be  used  for  searching  as  well  as  for 
sorting;  and  we  can  rather  quickly  remove  consecutive  blocks  of  elements  from 
a balanced  tree.  But  tt(N)  steps  are  needed  in  general  to  merge  two  balanced 
trees,  while  leftist  trees  can  be  merged  in  only  O(loglV)  steps. 

In  summary,  heaps  use  minimum  memory;  leftist  trees  are  great  for  merging 
disjoint  priority  queues;  and  the  flexibility  of  balanced  trees  is  available,  if 
necessary,  at  reasonable  cost. 
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/^\  Many  new  ways  to  represent  priority  queues  have  been  discovered  since  the 
jl  pioneering  work  of  Williams  and  Crane  discussed  above.  Programmers  now 
have  a large  menu  of  options  to  ponder,  besides  simple  lists,  heaps,  leftist  or 
balanced  trees: 

• stratified  trees,  which  provide  symmetrical  priority  queue  operations  in  only 
O (log  log  M)  steps  when  all  keys  lie  in  a given  range  0 < K < M [P.  van 
Emde  Boas,  R.  Kaas,  and  E.  Zijlstra,  Math.  Systems  Theory  10  (1977), 
99-127]; 

• binomial  queues  [J.  Vuillemin,  CACM  21  (1978),  309-315;  M.  R.  Brown, 
SICOMP  7 (1978),  298-319]; 

• pagodas  [J.  Frangon,  G.  Viennot,  and  J.  Vuillemin,  FOCS  19  (1978),  1-7]; 

• pairing  heaps  [M.  L.  Fredman,  R.  Sedgewick,  D.  D.  Sleator,  and  R.  E.  Tarjan. 
Algorithmica  1 (1986),  111  129;  J.  T.  Stasko  and  J.  S.  Vitter,  CACM  30 
(1987),  234-249;  M.  L.  Fredman,  JACM  46  (1999),  473-501]; 

• skew  heaps  [D.  D.  Sleator  and  R.  E.  Tarjan,  SICOMP  15  (1986),  52-59]; 

• Fibonacci  heaps  [M.  L.  Fredman  and  R.  E.  Tarjan,  JACM  34  (1987),  596- 
615]  and  the  more  general  AF-heaps  [M.  L.  Fredman  and  D.  E.  Willard, 
J.  Computer  and  System  Sci.  48  (1994),  533-  551]; 

• calendar  queues  [R.  Brown,  CACM  31  (1988),  1220-1227;  G.  A.  Davison, 
CACM  32  (1989),  1241-1243]; 

• relaxed  heaps  [J.  R.  Driscoll,  H.  N.  Gabow,  R.  Shrairman,  and  R.  E.  Tarjan, 
CACM  31  (1988),  1343-1354]; 

• fishspear  [M.  J.  Fischer  and  M.  S.  Paterson,  JACM  41  (1994),  3-30]; 

• hot  queues  [B.  V.  Cherkassky,  A.  V.  Goldberg,  and  C.  Silverstein,  SICOMP 
28  (1999),  1326-1346]; 

etc.  Not  all  of  these  methods  will  survive  the  test  of  time;  leftist  trees  are  in  fact 
already  obsolete,  except  for  applications  with  a strong  tendency  towards  last-in- 
first-out  behavior.  Detailed  implementations  and  expositions  of  binomial  queues 
and  Fibonacci  heaps  can  be  found  in  D.  E.  Knuth,  The  Stanford  GraphBase 
(New  York:  ACM  Press,  1994),  475-489. 


* Analysis  of  heapsort.  Algorithm  H is  rather  complicated,  so  it  probably  will 
never  submit  to  a complete  mathematical  analysis;  but  several  of  its  properties 
can  be  deduced  without  great  difficulty.  Therefore  we  shall  conclude  this  section 
by  studying  the  anatomy  of  a heap  in  some  detail. 

Figure  28  shows  the  shape  of  a heap  with  26  elements;  each  node  has  been 
labeled  in  binary  notation  corresponding  to  its  subscript  in  the  heap.  Asterisks 
in  this  diagram  denote  the  special  nodes,  those  that  lie  on  the  path  from  1 to  N. 

One  of  the  most  important  attributes  of  a heap  is  the  collection  of  its  subtree 
sizes.  For  example,  in  Fig.  28  the  sizes  of  the  subtrees  rooted  at  1,  2, . . . , 26  are, 
respectively, 


26*,  15,  10*,  7,  7,  6*,  3,  3,  3,  3,  3,  3,  2*,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, 1,  1*.  (12) 


Asterisks  denote  special  subtrees,  rooted  at  the  special  nodes;  exercise  20  shows 
that  if  the  binary  representation  of  N is 


N — (bnbn-i  ■ ■ ■ bibo)?, 


(13) 


n=  LlgiVJ, 
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Fig.  28.  A heap  of  26  = (11010)2  elements  looks  like  this. 


then  the  special  subtree  sizes  are  always 


(Ifrn— 1 ■ • • ^1^0)2 1 (lfrn-2  • ■ • bibo)2,  ■ ■ ■ , (lf>lf>o)2;  (l^o)2)  (1)2-  (14) 


Nonspecial  subtrees  are  always  perfectly  balanced,  so  their  size  is  always  of  the 
form  2k  — 1.  Exercise  21  shows  that  the  nonspecial  sizes  consist  of  exactly 


N-l 

2 


Is, 


N — 2 
4 


3s, 


N - 4 
„ 8 . 


7s, 


N-  2n~1 
2n 


(2n 


l)s.  (15) 


For  example,  Fig.  28  contains  twelve  nonspecial  subtrees  of  size  1,  six  of  size  3, 
two  of  size  7,  and  one  of  size  15. 

Let  si  be  the  size  of  the  subtree  whose  root  is  (,  and  let  MN  be  the  multiset 
{si,  S2,  ■ ■ ■ , sjv}  of  all  these  sizes.  We  can  calculate  M n easily  for  any  given  N 
by  using  (14)  and  (15).  Exercise  5.1.4-20  tells  us  that  the  total  number  of  ways 
to  arrange  the  integers  {1,2,...,  N}  into  a heap  is 


N\/siS2  ■ ■ ■ sjv  = N\  /IK*  | s G Mn}.  (16) 

For  example,  the  number  of  ways  to  place  the  26  letters  { A , B,  C, . . . , Z}  into 
Fig.  28  so  that  vertical  lines  preserve  alphabetic  order  is 


26! / (26  • 10  ■ 6 • 2 • 1 • l12  • 36  • 72  • 151). 


We  are  now  in  a position  to  analyze  the  heap-creation  phase  of  Algorithm  H, 
namely  the  computations  that  take  place  before  the  condition  l — 1 occurs  for 
the  first  time  in  step  H2.  Fortunately  we  can  reduce  the  study  of  heap  creation 
to  the  study  of  independent  siftup  operations,  because  of  the  following  theorem. 

Theorem  H.  If  Algorithm  H is  applied  to  a random  permutation  of  {1,2, ...  ,N}, 
each  of  the  N\  j {{  {.s  j s € Af ,v  } possible  heaps  is  an  equally  likely  outcome  of  the 
heap-creation  phase.  Moreover,  each  of  the  |_iV/ 2J  siftup  operations  performed 
during  this  phase  is  uniform,  in  the  sense  that  each  of  the  si  possible  values  of  i 
is  equally  likely  when  step  H8  is  reached. 
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Proof.  We  can  apply  what  numerical  analysts  might  call  a “backwards  analysis” ; 
given  a possible  result  K\ . . . Kn  of  the  siftup  operation  rooted  at  l,  we  see  that 
there  are  exactly  s;  prior  configurations  K[. . . K'N  of  the  file  that  will  sift  up 
to  that  result.  Each  of  these  prior  configurations  has  a different  value  of  K[ ; 
hence,  working  backwards,  there  are  exactly  s;  s;+1 . . . sN  input  permutations  of 
{1,2,...,  N } that  yield  the  configuration  Kx. . . KN  after  the  siftup  at  position  l 
has  been  completed. 

The  case  l — 1 is  typical:  Let  K x. . . KN  be  a heap,  and  let  K[. . . K'N  be 
a file  that  is  transformed  by  siftup  into  K\...Kn  when  l = 1,  K = K[.  If 
K = Kt,  we  must  have  K[  - -Kp/2j,  K[l/2\  = K\i/i\,  etc.,  while  K'}  = KJ 
for  all  j not  on  the  path  from  1 to  i.  Conversely,  for  each  i this  construction 
yields  a file  K[ . . . K'N  such  that  (a)  siftup  transforms  K[. . . K'N  into  Ky . . . Kn, 
and  (b)  Kyj/2\  > Kj  for  2 < [j/2j  < j < N.  Therefore  exactly  N such  files 
K[. . . K'n  are  possible,  and  the  siftup  operation  is  uniform.  (An  example  of  the 
proof  of  this  theorem  appears  in  exercise  22.)  | 

Referring  to  the  quantities  A,  B,  C,  D in  the  analysis  of  Program  H,  we  can 
see  that  a uniform  siftup  operation  on  a subtree  of  size  s contributes  [s /2J  /s  to 
the  average  value  of  A;  it  contributes 

7(0  + 1 + 1 + 2 + • • ■ + LlgsJ)  - - VLlgfcJ  = - ((a  + l)LlgaJ  - 2L'8<J+1  + 2) 

S k= is 

to  the  average  value  of  B (see  exercise  1.2.4-42);  and  it  contributes  either  2/s  or 
0 to  the  average  value  of  D,  according  as  s is  even  or  odd.  The  corresponding 
contribution  to  C is  somewhat  more  difficult  to  determine,  so  it  has  been  left  to 
the  reader  (see  exercise  26).  Summing  over  all  siftups,  we  find  that  the  average 
value  of  A during  heap  creation  is 

A'n  = ^2{[s/2\/s  \ s e Mn},  (17) 

and  similar  formulas  hold  for  B,  C,  and  D.  It  is  therefore  possible  to  compute 
these  average  values  exactly  without  great  difficulty,  and  the  following  table 
shows  typical  results: 


N 

A' 

an 

B'n 

CN 

D'n 

99 

19.18 

68.35 

42.95 

0.00 

100 

19.93 

69.39 

42.71 

1.84 

999 

196.16 

734.66 

464.53 

0.00 

1000 

196.94 

735.80 

464.16 

1.92 

9999 

1966.02 

7428.18 

4695.54 

0.00 

10000 

1966.82 

7429.39 

4695.06 

1.97 

10001 

1966.45 

7430.07 

4695.84 

0.00 

10002 

1967.15 

7430.97 

4695.95 

1.73 

Asymptotically  speaking,  we  may  ignore  the  special  subtree  sizes  in  MN,  and  we 
find  for  example  that 

NO  N 1 N 3 

^ = Y'l+T'3  + ¥‘7+'"  + = i1  - h<*)N  + 0(log  AO,  (18) 
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where 

a = = 1-60669  51524  15291  76378 33015  23190 92458 04806-.  (19) 

fc>i  z 

(This  value  was  first  computed  to  high  precision  by  J.  W.  Wrench,  Jr.,  using  the 
series  transformation  of  exercise  27.  Paul  Erdos  has  proved  that  a is  irrational 
[J.  Indian  Math.  Soc.  12  (1948),  63-66],  and  Peter  Borwein  has  demonstrated 
the  irrationality  of  many  similar  constants  [Proc.  Camb.  Phil.  Soc.  112  (1992), 
141-146].)  For  large  N,  we  may  use  the  approximate  formulas 

A'n  k 0.1967./V  + ( — 1)JV0.3; 

B'n  ps  0.744031V  - 1.3  In  IV; 

C'N  « 0.470341V  - 0.8  In  IV; 

D'n  k,  (1.8  ± 0.2)  [N  even] . 

The  minimum  and  maximum  values  are  also  readily  determined.  Only  O(N) 
steps  are  needed  to  create  the  heap  (see  exercise  23). 

This  theory  nicely  explains  the  heap-creation  phase  of  Algorithm  H.  But 
the  selection  phase  is  another  story,  which  remains  to  be  written!  Let  A"N,  B'^,, 
Cm,  and  D'n  denote  the  average  values  of  A,  B,  C,  and  D during  the  selection 
phase  when  N elements  are  being  heapsorted.  The  behavior  of  Algorithm  H on 
random  input  is  subject  to  comparatively  little  fluctuation  about  the  empirically 
determined  average  values 

A'jf  ps  0.1521V; 

B'^  «JVlgJV-2.61JV; 

C'n  ps  |lVlglV  — 1.411V;  ^21 

D'n  lglV  ± 2; 

but  no  adequate  theoretical  explanation  for  the  behavior  of  D ^ or  for  the 
conjectured  constants  0.152,  2.61,  or  1.41  has  yet  been  found.  The  leading 
terms  of  B'^  and  C ^ have,  however,  been  established  in  an  elegant  manner  by 
R.  Schaffer  and  R.  Sedgewick;  see  exercise  30.  Schaffer  has  also  proved  that  the 
minimum  and  maximum  possible  values  of  C are  respectively  asymptotic  to 
jlVlglV  and  |lVlglV. 

EXERCISES 

1.  [10]  Is  straight  selection  (Algorithm  S)  a stable  sorting  method? 

2.  [15]  Why  does  it  prove  to  be  more  convenient  to  select  the  largest  key,  then 
the  second-largest,  etc.,  in  Algorithm  S,  instead  of  first  finding  the  smallest,  then  the 
second-smallest,  etc.? 

3.  [M21]  (a)  Prove  that  if  the  input  to  Algorithm  S is  a random  permutation  of 
{1,2,...,  IV},  then  the  first  iteration  of  steps  S2  and  S3  yields  a random  permutation 
of  {1,2, . . . ,N— 1}  followed  by  N.  (In  other  words,  the  presence  of  each  permutation 
of  {1,2,...,  TV  — 1 } in  K\ . . . Kn-i  is  equally  likely.)  (b)  Therefore  if  fljv  denotes  the 
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average  value  of  the  quantity  B in  Program  S,  given  randomly  ordered  input,  we  have 
Bn  = Hn  — 1 + Bn-i ■ [Hint:  See  Eq.  1.2.10-(i6).] 

► 4.  [M25]  Step  S3  of  Algorithm  S accomplishes  nothing  when  i = j;  is  it  a good  idea 
to  test  whether  or  not  i = j before  doing  step  S3?  What  is  the  average  number  of 
times  the  condition  i = j will  occur  in  step  S3  for  random  input? 

5.  [20]  What  is  the  value  of  the  quantity  B in  the  analysis  of  Program  S,  when  the 
input  is  IV. . . 3 2 1? 

6.  [M29]  (a)  Let  a\a2...aN  be  a permutation  of  {1,2,..., IV}  having  C cycles, 
I inversions,  and  B changes  to  the  right-to-left  maxima  when  sorted  by  Program  S. 
Prove  that  2 B < I + N — C.  [Hint:  See  exercise  5.2.2— 1.]  (b)  Show  that  I + N — C < 
[N2/2\;  hence  B can  never  exceed  |_?V2/4J. 

7.  [Mil]  Find  the  variance  of  the  quantity  B in  Program  S,  as  a function  of  N, 
assuming  random  input. 

► 8.  [24]  Show  that  if  the  search  for  max  (Ki, . . . , Kj)  in  step  S2  is  carried  out  by 
examining  keys  in  left-to-right  order  K\,  K2,  . . . , K, . instead  of  going  from  right  to 
left  as  in  Program  S,  it  is  often  possible  to  reduce  the  number  of  comparisons  needed 
on  the  next  iteration  of  step  S2.  Write  a MIX  program  based  on  this  observation. 

9.  [M25]  What  is  the  average  number  of  comparisons  performed  by  the  algorithm 
of  exercise  8,  for  random  input? 

10.  [12]  What  will  be  the  configuration  of  the  tree  in  Fig.  23  after  14  of  the  original 
16  items  have  been  output? 

11.  [10]  What  will  be  the  configuration  of  the  tree  in  Fig.  24  after  the  element  908 
has  been  output? 

12.  [M20]  How  many  times  will  -oo  be  compared  with  -oo  when  the  bottom-up 
method  of  Fig.  23  is  used  to  sort  a file  of  2n  elements  into  order? 

13.  [20]  (J.  W.  J.  Williams.)  Step  H4  of  Algorithm  H distinguishes  between  the  three 
cases  j < r , j = r , and  j > r.  Show  that  if  K > Kr+ 1 it  would  be  possible  to  simplify 
step  H4  so  that  only  a two-way  branch  is  made.  How  could  the  condition  K > Kr+i 
be  ensured  throughout  the  heapsort  process,  by  modifying  step  H2? 

14.  [10]  Show  that  simple  queues  are  special  cases  of  priority  queues.  (Explain  how 
keys  can  be  assigned  to  the  elements  so  that  a largest-in-first-out  procedure  is  equivalent 
to  first-in-first-out.)  Is  a stack  also  a special  case  of  a priority  queue? 

► 15.  [M22]  (B.  A.  Chartres.)  Design  a high-speed  algorithm  that  builds  a table  of 
the  prime  numbers  < N,  making  use  of  a priority  queue  to  avoid  division  operations. 
[Hint:  Let  the  smallest  key  in  the  priority  queue  be  the  least  odd  nonprime  number 
greater  than  the  last  odd  number  considered  as  a prime  candidate.  Try  to  minimize 
the  number  of  elements  in  the  queue.] 

16.  [20]  Design  an  efficient  algorithm  that  inserts  a new  key  into  a given  heap  of 
n elements,  producing  a heap  of  n + 1 elements. 

17.  [20]  The  algorithm  of  exercise  16  can  be  used  for  heap  creation,  instead  of  the 
“decrease  l to  1”  method  used  in  Algorithm  H.  Do  both  methods  create  the  same  heap 
when  they  begin  with  the  same  input  file? 

► 18.  [21]  (R.  W.  Floyd.)  During  the  selection  phase  of  heapsort,  the  key  K tends  to 
be  quite  small,  so  that  nearly  all  of  the  comparisons  in  step  H6  find  K < Kj.  Show 
how  to  modify  the  algorithm  so  that  K is  not  compared  with  Kj  in  the  main  loop  of 
the  computation,  thereby  nearly  cutting  the  average  number  of  comparisons  in  half. 
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19.  [21]  Design  an  algorithm  that  deletes  a given  element  of  a heap  of  length  N, 
producing  a heap  of  length  N — 1. 

20.  [M20]  Prove  that  (14)  gives  the  special  subtree  sizes  in  a heap. 

21.  [ M 24  j Prove  that  (15)  gives  the  nonspecial  subtree  sizes  in  a heap. 

► 22.  [20]  What  permutations  of  {1,  2, 3, 4,  5}  are  transformed  into  5 3 4 1 2 by  the  heap- 
creation  phase  of  Algorithm  H? 

23.  [M28]  (a)  Prove  that  the  length  of  scan,  B,  in  a siftup  algorithm  never  exceeds 
[lg(r/Z)J.  (b)  According  to  (8),  B can  never  exceed  N[\gN\  in  any  particular  appli- 
cation of  Algorithm  H.  Find  the  maximum  value  of  B as  a function  of  N,  taken  over 
all  possible  input  files.  (You  must  prove  that  an  input  file  exists  such  that  B takes  on 
this  maximum  value.) 

24.  [M24 ] Derive  an  exact  formula  for  the  standard  deviation  of  B'N  (the  total  length 
of  scan  during  the  heap-creation  phase  of  Algorithm  H). 

25.  [ M20 ] What  is  the  average  value  of  the  contribution  to  C made  during  the  siftup 
pass  when  l = 1 and  r ==  N,  if  N = 2n+1  — 1? 

26.  [MSO]  Solve  exercise  25,  (a)  for  N = 26,  (b)  for  general  N. 

27.  [M25]  (T.  Clausen,  1828.)  Prove  that 


E 

n>  1 


X 
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1 — xn 


E 

n>  1 


l + xn 
1 - xn 


n 


X 
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(Setting  x = | gives  a very  rapidly  converging  series  for  the  evaluation  of  (19).) 

28.  [35]  Explore  the  idea  of  ternary  heaps,  based  on  complete  ternary  trees  instead 
of  binary  trees.  Do  ternary  heaps  sort  faster  than  binary  heaps? 

29.  [26]  (W.  S.  Brown.)  Design  an  algorithm  for  multiplication  of  polynomials  or 

power  series  (aixn  + a2x12  + • • • )(b ixJ1  + b2xj2  + •••),  in  which  the  coefficients  of 

the  answer  cixtl+J1  + • • • are  generated  in  order  as  the  input  coefficients  are  being 

multiplied.  [Hint:  Use  an  appropriate  priority  queue.] 

► 30.  [HM35]  (R.  Schaffer  and  R.  Sedgewick.)  Let  hnrn  be  the  number  of  heaps  on 

the  elements  {1,2,  ...,n}  for  which  the  selection  phase  of  heapsort  does  exactly  m 

promotions.  Prove  that  hnm  < 2m  I~[fe=2  ^8  &,  and  use  this  relation  to  show  that  the 
average  number  of  promotions  performed  by  Algorithm  H is  IVTg  N + 0(N  log  log  N). 

31.  [37]  (J.  W.  J.  Williams.)  Show  that  if  two  heaps  are  placed  “back  to  back”  in  a 
suitable  way,  it  is  possible  to  maintain  a structure  in  which  either  the  smallest  or  the 
largest  element  can  be  deleted  at  any  time  in  O(logn)  steps.  (Such  a structure  may  be 
called  a priority  deque.) 

32.  [M28]  Prove  that  the  number  of  heapsort  promotions,  B,  is  always  at  least 
\ NlgN  + O(N),  if  the  keys  being  sorted  are  distinct.  Hint:  Consider  the  movement 
of  the  largest  \N/ 2]  keys. 

33.  [21]  Design  an  algorithm  that  merges  two  disjoint  priority  queues,  represented 
as  leftist  trees,  into  one.  (In  particular,  if  one  of  the  given  queues  contains  a single 
element,  your  algorithm  will  insert  it  into  the  other  queue.) 

34.  [M41]  How  many  leftist  trees  with  N nodes  are  possible,  ignoring  the  KEY  values? 
The  sequence  begins  1,  1,  2,  4,  8,  17,  38,  87,  203,  482,  1160,  . . . ; show  that  the  number 
is  asymptotically  abNN~3^2  for  suitable  constants  a and  b,  using  techniques  like  those 
of  exercise  2. 3. 4. 4-4. 
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35.  [26]  If  UP  links  are  added  to  a leftist  tree  (see  the  discussion  of  triply  linked  trees  in 
Section  6.2.3),  it  is  possible  to  delete  an  arbitrary  node  P from  within  the  priority  queue 
as  follows:  Replace  P by  the  merger  of  LEFT(P)  and  RIGHT (P);  then  adjust  the  DIST 
fields  of  P’s  ancestors,  possibly  swapping  left  and  right  subtrees,  until  either  reaching 
the  root  or  reaching  a node  whose  DIST  is  unchanged. 

Prove  that  this  process  never  requires  changing  more  than  O(logTV)  of  the  DIST 
fields,  if  there  are  N nodes  in  the  tree,  even  though  the  tree  may  contain  very  long 
upward  paths. 

36.  [IS]  (Least-recently-used  page  replacement.)  Many  operating  systems  make  use  of 
the  following  type  of  algorithm:  A collection  of  nodes  is  subjected  to  two  operations, 
(i)  “using”  a node,  and  (ii)  replacing  the  least-recently-used  node  by  a new  node.  What 
data  structure  makes  it  easy  to  ascertain  the  least-recently-used  node? 

37.  [HM32]  Let  eN(k)  be  the  expected  treewise  distance  of  the  kth-largest  element 
from  the  root,  in  a random  heap  of  N elements,  and  let  e(k)  = limjv-+oo  eN(k).  Thus 
e(l)  = 0,  e(2)  = 1,  e(3)  = 1.5,  and  e(4)  = 1.875.  Find  the  asymptotic  value  of  e(k)  to 
within  0(k _1). 

38.  [M21]  Find  a simple  recurrence  relation  for  the  multiset  MN  of  subtree  sizes  in  a 
heap  or  in  a complete  binary  tree  with  N internal  nodes. 

5.2.4.  Sorting  by  Merging 

Merging  (or  collating)  means  the  combination  of  two  or  more  ordered  files  into 
a single  ordered  file.  For  example,  we  can  merge  the  two  files  503  703  765  and 
087  512  677  to  obtain  087  503  512  677  703  765.  A simple  way  to  accomplish  this 
is  to  compare  the  two  smallest  items,  output  the  smallest,  and  then  repeat  the 
same  process.  Starting  with 

f 503  703  765 
\ 087  512  677 

I 503  703  765 
{ 512  677 

f 703  765 
\ 512  677 

f 703  765 
\ 677 

and  so  on.  Some  care  is  necessary  when  one  of  the  two  files  becomes  exhausted; 
a detailed  description  of  the  process  appears  in  the  following  algorithm: 

Algorithm  M ( Two-way  m.erge).  This  algorithm  merges  nonempty  ordered  files 
xi  <x2  < ■ ■ • <xm  and  2/1  < 2/2  < ■ • • < yn  into  a single  file  Z\  < z2  < ■ ■ ■ < zm+n. 

Ml.  [Initialize.]  Set  i <-  1,  j 1,  k 1. 

M2.  [Find  smaller.]  If  Xi  < yh  go  to  step  M3,  otherwise  go  to  M5. 


we  obtain 


then 


and 


087 


087  503 


087  503  512 
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Fig.  29.  Merging  x\  < ■ ■ ■ < xm  with  yi  < • ■ • < yn- 


M3.  [Output  Xi]  Set  z k <—  Xi , k <—  k + 1,  i i + 1.  If  i < m,  return  to  M2. 

M4.  [Transmit  yj,. . . , yn.]  Set  (zk, . . . , zm+n)  <—  (yj, . . . , yn)  and  terminate  the 
algorithm. 

M5.  [Output  yj.]  Set  Zk  t—  yj,  k •<—  k + 1,  j j + 1.  If  j < n,  return  to  M2. 

M6.  [Transmit  Xi, . . . ,xm.]  Set  (zk, . . . , zm+n)  <—  (Xj, . . . ,xm)  and  terminate 
the  algorithm.  | 

We  shall  see  in  Section  5.3.2  that  this  straightforward  procedure  is  essentially 
the  best  possible  way  to  merge  on  a conventional  computer,  when  m k,  n.  (On 
the  other  hand,  when  m is  much  smaller  than  n,  it  is  possible  to  devise  more 
efficient  merging  algorithms,  although  they  are  rather  complicated  in  general.) 
Algorithm  M could  be  made  slightly  simpler  without  much  loss  of  efficiency  by 
placing  sentinel  elements  xm+i  = yn+i  = oo  at  the  end  of  the  input  files,  stopping 
just  before  oo  is  output.  For  an  analysis  of  Algorithm  M,  see  exercise  2. 

The  total  amount  of  work  involved  in  Algorithm  M is  essentially  propor- 
tional to  m + n,  so  it  is  clear  that  merging  is  a simpler  problem  than  sorting. 
Furthermore,  we  can  reduce  the  problem  of  sorting  to  merging,  because  we  can 
repeatedly  merge  longer  and  longer  subfiles  until  everything  is  in  sort.  We  may 
consider  this  to  be  an  extension  of  the  idea  of  insertion  sorting:  Inserting  a new 
element  into  a sorted  file  is  the  special  case  n = 1 of  merging.  If  we  want  to 
speed  up  the  insertion  process  we  can  consider  inserting  several  elements  at  a 
time,  “batching”  them,  and  this  leads  naturally  to  the  general  idea  of  merge 
sorting.  From  a historical  point  of  view,  merge  sorting  was  one  of  the  very  first 
methods  proposed  for  computer  sorting;  it  was  suggested  by  John  von  Neumann 
as  early  as  1945  (see  Section  5.5). 

We  shall  study  merging  in  considerable  detail  in  Section  5.4,  with  regard 
to  external  sorting  algorithms;  our  main  concern  in  the  present  section  is  the 
somewhat  simpler  question  of  merge  sorting  within  a high-speed  random-access 
memory. 

Table  1 shows  a merge  sort  that  “burns  the  candle  at  both  ends”  in  a manner 
similar  to  the  scanning  procedure  we  have  used  in  quicksort  and  radix  exchange: 
We  examine  the  input  from  the  left  and  from  the  right,  working  towards  the 
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middle.  Ignoring  the  top  line  of  the  table  for  a moment,  let  us  consider  the 
transformation  from  line  2 to  line  3.  At  the  left  we  have  the  ascending  run  503 
703  765;  at  the  right,  reading  leftwards,  we  have  the  run  087  512  677.  Merging 
these  two  sequences  leads  to  087  503  512  677  703  765,  which  is  placed  at  the 
left  of  line  3.  Then  the  keys  061  612  908  in  line  2 are  merged  with  170  509  897, 
and  the  result  (061  170  509  612  897  908)  is  recorded  at  the  right  end  of  line  3. 
Finally,  154  275  426  653  is  merged  with  653  — discovering  the  overlap  before  it 
causes  any  harm  — and  the  result  is  placed  at  the  left,  following  the  previous  run. 
Line  2 of  the  table  was  formed  in  the  same  way  from  the  original  input  in  line  1 . 


Table  1 

NATURAL  TWO-WAY  MERGE  SORTING 
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087 

512 

061 

908 

170 

897 

275 

[6531 

426 

154 

509 

612 

677 

[765 

703 

503 

703 
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908 
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275 

426 
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061 

087 

154 

170 

275 

426 

503 
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512 

612 

653 

677 

703 

765 

897 

908 

Vertical  lines  in  Table  1 represent  the  boundaries  between  runs.  They  are  the 
so-called  stepdowns,  where  a smaller  element  follows  a larger  one  in  the  direction 
of  reading.  We  generally  encounter  an  ambiguous  situation  in  the  middle  of  the 
file,  when  we  read  the  same  key  from  both  directions;  this  causes  no  problem  if  we 
are  a little  bit  careful  as  in  the  following  algorithm.  The  method  is  traditionally 
called  a “natural”  merge  because  it  makes  use  of  the  runs  that  occur  naturally 
in  its  input. 

Algorithm  N ( Natural  two-way  merge  sort).  Records  Ri,...,Rn  are  sorted 
using  two  areas  of  memory,  each  of  which  is  capable  of  holding  N records.  For 
convenience,  we  shall  say  that  the  records  of  the  second  area  are  Rn+i,  • ■ ■ , R2N, 
although  it  is  not  really  necessary  that  Rn+i  be  adjacent  to  RN.  The  initial 
contents  of  Rn+i,  . . . , R2n  are  immaterial.  After  sorting  is  complete,  the  keys 
will  be  in  order,  K\  < ■ ■ ■ < Kn- 

Nl.  [Initialize.]  Set  s f-  0.  (When  s ~ 0,  we  will  be  transferring  records  from 
the  (Ri, . . . , Rn)  area  to  the  (Rn+i,  ■ ■ ■ , R2n)  area;  when  s — 1,  we  will 
be  going  the  other  way.) 

N2.  [Prepare  for  pass.]  If  s - 0,  set  i +-  1,  j 4-  N,  k 4-  N + 1,  l <-  2 N;  if 
s — 1,  set  i 4—  N + 1,  j 4-  2N , k 4—  1,  l 4—  N.  (Variables  i,  j , k,  l point  to 
the  current  positions  in  the  “source  files”  being  read  and  the  “destination 
files”  being  written.)  Set  d 4-  1,  / 4-  1.  (Variable  d gives  the  current 
direction  of  output;  / is  set  to  zero  if  future  passes  are  necessary.) 

N3.  [Compare  K,  -Kj.}  If  Ki  > Kj,  go  to  step  N8.  If  i = j,  set  Rk  4-  R,  and 
go  to  N13. 
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Fig.  30.  Merge  sorting. 


N4.  [Transmit  Ri]  (Steps  N4-N7  are  analogous  to  steps  M3-M4  of  Algo- 
rithm M.)  Set  Rk  4—  Ri,  k 4—  k + d. 

N5.  [Stepdown?]  Increase  i by  1.  Then  if  Ki-\  < Ki,  go  back  to  step  N3. 

N6.  [Transmit  Rj.]  Set  Rk  4—  Rj,  k <—  k + d. 

N7.  [Stepdown?]  Decrease  j by  1.  Then  if  Kj+ 1 < Kj,  go  back  to  step  N6; 
otherwise  go  to  step  N12. 

N8.  [Transmit  Rj.]  (Steps  N8-N11  are  dual  to  steps  N4-N7.)  Set  Rk  4—  Rj, 
h 4 — k T d. 

N9.  [Stepdown?]  Decrease  j by  1.  Then  if  KJ+ 1 < Kj,  go  back  to  step  N3. 
N10.  [Transmit  Ri]  Set  Rk  4—  R.t,  k 4—  k + d. 

Nil.  [Stepdown?]  Increase  i by  1.  Then  if  Ki_ i < Ki,  go  back  to  step  N10. 
N12.  [Switch  sides.]  Set  f 4—  0,  d 4—  —d,  and  interchange  k 4->  l.  Return  to 
step  N3. 

N13.  [Switch  areas.]  If  f = 0,  set  s 4—  1 — s and  return  to  N2.  Otherwise  sorting 
is  complete;  if  s = 0,  set  (Ri,...,Rn)  4—  (Rn+i,  . . . , R2n)-  (This  last 
copying  operation  is  unnecessary  if  it  is  acceptable  to  have  the  output  in 
(Rn+i,  . . . , R2n)  about  half  of  the  time.)  | 

This  algorithm  contains  one  tricky  feature  that  is  explained  in  exercise  5. 

It  would  not  be  difficult  to  program  Algorithm  N for  MIX,  but  we  can 
deduce  the  essential  facts  of  its  behavior  without  constructing  the  entire  program. 
The  number  of  ascending  runs  in  the  input  will  be  about  | N,  under  random 
conditions,  since  we  have  Ki  > K,  f | with  probability  detailed  information 
about  the  number  of  runs,  under  slightly  different  hypotheses,  has  been  derived 


P 


i 
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in  Section  5.1.3.  Each  pass  cuts  the  number  of  runs  in  half  (except  in  unusual 
cases  such  as  the  situation  in  exercise  6).  So  the  number  of  passes  will  usually  be 
about  lg  — lg  N—  1.  Each  pass  requires  us  to  transmit  each  of  the  N records, 
and  by  exercise  2 most  of  the  time  is  spent  in  steps  N3,  N4,  N5,  N8,  N9.  We 
can  sketch  the  time  in  the  inner  loop  as  follows,  if  we  assume  that  there  is  low 


probability  of  equal  keys: 

Step  Operations  Time 

N3  CMPA,  JG.  JE  3.5u 

f N4  STA,  INC  3u 

Either  < 

\ N5  INC,  LDA.  CMPA,  JGE  6 u 

0r  f N8  STX,  INC  3u 

1 \ N9  DEC,  LDX,  CMPX,  JGE  6 u 


Thus  about  12.5m  is  spent  on  each  record  in  each  pass,  and  the  total  running 
time  will  be  asymptotically  12.5IVlgIV , for  both  the  average  case  and  the  worst 
case.  This  is  slower  than  quicksort’s  average  time,  and  it  may  not  be  enough 
better  than  heapsort  to  justify  taking  twice  as  much  memory  space,  since  the 
asymptotic  running  time  of  Program  5.2.3H  is  never  more  than  l&N\gN. 

The  boundary  lines  between  runs  are  determined  in  Algorithm  N entirely  by 
stepdowns.  This  has  the  possible  advantage  that  input  files  with  a preponderance 
of  increasing  order  can  be  handled  very  quickly,  and  so  can  input  files  with 
a preponderance  of  decreasing  order;  but  it  slows  down  the  main  loop  of  the 
calculation.  Instead  of  testing  stepdowns,  we  can  determine  the  length  of  runs 
artificially,  by  saying  that  all  runs  in  the  input  have  length  1,  all  runs  after  the 
first  pass  (except  possibly  the  last  run)  have  length  2, . . . , all  runs  after  k passes 
(except  possibly  the  last  run)  have  length  2k . This  is  called  a straight  two- merge, 
as  opposed  to  the  “natural”  merge  in  Algorithm  N. 

Straight  two-way  merging  is  very  similar  to  Algorithm  N,  and  it  has  essen- 
tially the  same  flow  chart;  but  things  are  sufficiently  different  that  we  had  better 
write  down  the  whole  algorithm  again: 

Algorithm  S ( Straight  two-way  merge  sort).  Records  Ri,...,Rn  are  sorted 
using  two  memory  areas  as  in  Algorithm  N. 

51.  [Initialize.]  Set  s 4 — 0,  p <—  1.  (For  the  significance  of  variables  s,  i,  j , k. 
I,  and  d,  see  Algorithm  N.  Here  p represents  the  size  of  ascending  runs  to 
be  merged  on  the  current  pass;  further  variables  q and  r will  keep  track  of 
the  number  of  unmerged  items  in  a run.) 

52.  [Prepare  for  pass.]  If  s = 0,  set  i 1,  j <-  N,  k «—  N,  l <-  2 N + 1;  if  s = 1, 
set  i 4—  N + 1 , j 4—  2 N,  k 4—  0,  l 4—  N + 1.  Then  set  d 4—  1,  q 4—  p,  r 4—  p. 

53.  [Compare  Ki : Kj.)  If  Ki  > KJy  go  to  step  S8. 

54.  [Transmit  /?,.]  Set  k 4—  k + d,  R *.  4—  Ri. 

55.  [End  of  run?]  Set  i «—  i + 1,  q q — 1.  If  q > 0,  go  back  to  step  S3. 

56.  [Transmit  Rr]  Set  k 4—  k + d.  Then  if  k = l,  go  to  step  S13;  otherwise  set 

Rk  Rj  ■ 
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Table  2 

STRAIGHT  TWO-WAY  MERGE  SORTING 


503  | 087  | 512  | 061  | 908  | 170  | 897  | 275  | 653  | 426  j 154  [ 509  | 612  j 677  | 765  | 703 

503  703  | 512  677  | 509  908  | 426  897  | 653  275  | 170  154  | 612  061  | 765  087 

087  503  703  765  | 154  170  509  908 1 897  653  426  275  | 677  612  512  061 

061  087  503  512  612  677  703  765  1 908  897  653  509  426  275  170  154 

061  087  154  170  275  426  503  509  512  612  653  677  703  765  897  908 


57.  [End  of  run?]  Set  j 4-  j - 1,  r 4-  r - 1.  If  r > 0,  go  back  to  step  S6; 
otherwise  go  to  S12. 

58.  [Transmit  R3 .]  Set  k 4—  k + d,  Rk  4-  Rj. 

59.  [End  of  run?]  Set  j 4-  j - 1,  r 4-  r - 1.  If  r > 0,  go  back  to  step  S3. 

510.  [Transmit  i?;.]  Set  k 4—  k + d.  Then  if  k = /,  go  to  step  S13;  otherwise  set 

Rk  t—  Rt  . 

511.  [End  of  run?]  Set  i 4-  i + 1,  q 4-  q - 1.  If  q > 0,  go  back  to  step  S10. 

512.  [Switch  sides.]  Set  q 4—  p,  r 4—  p,  d 4—  — d,  and  interchange  k 4-4  l.  If 
j — i < p,  return  to  step  S10;  otherwise  return  to  S3. 

513.  [Switch  areas.]  Set  p <-  p + p.  If  p < N,  set  s 4-  1 - s and  return  to  S2. 
Otherwise  sorting  is  complete;  if  s = 0,  set 

{Ri ■>  ■ ■ ■ i Rn)  <—  (Rn+i,  ■ ■ ■ , R2N )• 

(The  latter  copying  operation  will  be  done  if  and  only  if  [lg  N]  is  odd,  or  in 
the  trivial  case  N = 1,  regardless  of  the  distribution  of  the  input.  Therefore 
it  is  possible  to  predict  the  location  of  the  sorted  output  in  advance,  and 
copying  will  usually  be  unnecessary.)  | 

An  example  of  this  algorithm  appears  in  Table  2.  It  is  somewhat  amazing 
that  the  method  works  properly  when  N is  not  a power  of  2;  the  runs  being 
merged  are  not  all  of  length  2k , yet  no  provision  has  apparently  been  made  for 
the  exceptions!  (See  exercise  8.)  The  former  tests  for  stepdowns  have  been 
replaced  by  decrementing  q or  r and  testing  the  result  for  zero;  this  reduces  the 
asymptotic  MIX  running  time  to  111V  lg  N units,  slightly  faster  than  we  were  able 
to  achieve  with  Algorithm  N. 

In  practice  it  would  be  worthwhile  to  combine  Algorithm  S with  straight 
insertion;  we  can  sort  groups  of,  say,  16  items  using  straight  insertion,  in  place  of 
the  first  four  passes  of  Algorithm  S,  thereby  avoiding  the  comparatively  wasteful 
bookkeeping  operations  involved  in  short  merges.  As  we  saw  with  quicksort, 
such  a combination  of  methods  does  not  affect  the  asymptotic  running  time,  but 
it  gives  us  a reasonable  improvement  nevertheless. 

Let  us  now  study  Algorithms  N and  S from  the  standpoint  of  data  structures. 
\\  hy  did  we  need  2 N record  locations  instead  of  N?  The  reason  is  comparatively 
simple:  We  were  dealing  with  four  lists  of  varying  size  (two  source  lists  and 
two  destination  lists  on  each  pass);  and  we  were  using  the  standard  “growing 
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together”  idea  discussed  in  Section  2.2.2,  for  each  pair  of  sequentially  allocated 
lists.  But  half  of  the  memory  space  was  always  unused,  and  a little  reflection 
shows  that  we  could  really  make  use  of  a linked  allocation  for  the  four  lists.  If 
we  add  one  link  field  to  each  of  the  N records,  we  can  do  everything  required 
by  the  merging  algorithms  using  simple  link  manipulations,  without  moving  the 
records  at  all!  Adding  N link  fields  is  generally  better  than  adding  the  space 
needed  for  N more  records,  and  the  reduced  record  movement  may  also  save 
us  time,  unless  our  computer  memory  is  especially  good  at  sequential  reading 
and  writing.  Therefore  we  ought  to  consider  also  a merging  algorithm  like  the 
following  one: 

Algorithm  L (List  merge  soi't).  Records  f?i, . . . ,Rn  are  assumed  to  contain 
keys  A'i, . . . , KNl  together  with  link  fields  LX,...,LN  capable  of  holding  the 
numbers  — (N  + 1)  through  (N  + 1).  There  are  two  auxiliary  link  fields  L0  and 
Ln+ i in  artificial  records  R0  and  Rn+i  at  the  beginning  and  end  of  the  file.  This 
algorithm  is  a “list  sort”  that  sets  the  link  fields  so  that  the  records  are  linked 
together  in  ascending  order.  After  sorting  is  complete,  L0  will  be  the  index  of 
the  record  with  the  smallest  key;  and  Lk,  for  l < k < N,  will  be  the  index  of  the 
record  that  follows  Rk,  or  Lk  = 0 if  Rk  is  the  record  with  the  largest  key.  (See 
Eq.  5.2.1-(i3).) 

During  the  course  of  this  algorithm,  R0  and  Rn+i  serve  as  list  heads  for  two 
linear  lists  whose  sublists  are  being  merged.  A negative  link  denotes  the  end  of 
a sublist  known  to  be  ordered;  a zero  link  denotes  the  end  of  the  entire  list.  We 
assume  that  N > 2. 

The  notation  “|La | <—  p"  means  “Set  Ls  to  p or  —p,  retaining  the  previous 
sign  of  L,”  This  operation  is  well-suited  to  MIX,  but  unfortunately  not  to  most 
computers;  it  is  possible  to  modify  the  algorithm  in  straightforward  ways  to 
obtain  an  equally  efficient  method  for  most  other  machines. 

LI.  [Prepare  two  lists.]  Set  L0  <-  1,  LN+1  <- 2 , Lt  i (i  + 2)  for  1 < i < N -2, 

and  Ln-i  t—  LN  ■(—  0.  (We  have  created  two  lists  containing  R1,R3,  f?5, . . . 
and  f?2,  Ri,  Re,  ■ ■ ■ , respectively;  the  negative  links  indicate  that  each  or- 
dered sublist  consists  of  one  element  only.  For  another  way  to  do  this  step, 
taking  advantage  of  ordering  that  may  be  present  in  the  initial  data,  see 
exercise  12.) 

L2.  [Begin  new  pass.]  Set  s <-  0,  t <-  N + 1,  p 4-  Ls,  q <-  Lt.  If  q = 0,  the 
algorithm  terminates.  (During  each  pass,  p and  q traverse  the  lists  being 
merged;  s usually  points  to  the  most  recently  processed  record  of  the  current 
sublist,  while  t points  to  the  end  of  the  previously  output  sublist.) 

L3.  [Compare  Kp  : Kq.}  If  Kp  > Kq,  go  to  L6. 

L4.  [Advance  p]  Set  \La  \ <-  p,  s <-  p,  p «-  Lp.  If  p > 0,  return  to  L3. 

L5.  [Complete  the  sublist.]  Set  Ls  <-  q,  s <-  t.  Then  set  t <-  q and  q <-  Lq,  one 
or  more  times,  until  q < 0.  Finally  go  to  L8. 

L6.  [Advance  q.\  (Steps  L6  and  L7  are  dual  to  L4  and  L5.)  Set  |LS|  <—  q,  s q, 
q <—  Lq.  If  q > 0,  return  to  L3. 


5-2.4  SORTING  BY  MERGING  165 

Table  3 

LIST  MERGE  SORTING 


j 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

Kj 

- 

503 

087 

512 

061 

908 

170 

897 

275 

653 

426 

154 

509 

612 

677 

765 

703 

Lj 

1 

-3 

-4 

-5 

-6 

-7 

-8 

-9 

-10 

-11 

-12 

-13 

-14 

-15 

-16 

0 

0 

2 

Lj 

2 

-6 

1 

-8 

3 

-10 

5 

-11 

7 

-13 

9 

12 

-16 

14 

0 

0 

15 

4 

Lj 

4 

3 

1 

-11 

2 

-13 

8 

5 

7 

0 

12 

10 

9 

14 

16 

0 

15 

6 

Lj 

4 

3 

6 

7 

2 

0 

8 

5 

1 

14 

12 

10 

13 

9 

16 

0 

15 

11 

Lj 

4 

12 

11 

13 

2 

0 

8 

5 

10 

14 

1 

6 

3 

9 

16 

7 

15 

0 

L7.  [Complete  the  sublist.]  Set  Ls  p,  s t.  Then  set  t <-  p and  p <-  Lp,  one 
or  more  times,  until  p < 0. 

L8.  [End  of  pass?]  (At  this  point,  p < 0 and  q < 0,  since  both  pointers  have 
moved  to  the  end  of  their  respective  sublists.)  Set  p ■<—  —p.  q •<—  —q.  If 
q = 0,  set  \LS\  <—  p,  \Lt  \ <—  0 and  return  to  L2.  Otherwise  return  to  L3.  | 

An  example  of  this  algorithm  in  action  appears  in  Table  3,  where  we  can  see  the 
link  settings  each  time  step  L2  is  encountered.  It  is  possible  to  rearrange  the 
records  Ri , . . . , R jy  at  the  end  of  this  algorithm  so  that  their  keys  are  in  order, 
using  the  method  of  exercise  5.2-12.  There  is  an  interesting  similarity  between 
list  merging  and  the  addition  of  sparse  polynomials  (see  Algorithm  2.2.4A). 

Let  us  now  construct  a MIX  program  for  Algorithm  L,  to  see  whether  the 
list  manipulation  is  advantageous  from  the  standpoint  of  speed  as  well  as  space: 

Program  L (List  merge  sort).  For  convenience,  we  assume  that  records  are 
one  word  long,  with  L3  in  the  (0:2)  field  and  Kj  in  the  (3:5)  field  of  location 


INPUT  + j;  rll  = 

p,  rI2  = q,  rI3  = 

s,  rI4  = t, 

rA  - Kq\  N >2. 

01 

L 

EQU 

0:2 

Definition  of  field  names 

02 

ABSL 

EQU 

1:2 

'll 

03 

KEY 

EQU 

3:5 

jy 

jj1  a 

04 

START 

ENT1 

CN 

1 

S 

1 

LI.  Prepare  two  lists. 

05 

ENNA 

2,1 

N -2 

1 

06 

STA 

INPUT, 1 (L) 

N-2 

Li  4 (i  + 2). 

01 

DEC1 

1 

N - 2 

08 

J1P 

*-3 

N-2 

N - 2 > i > 0. 

09 

ENTA 

1 

1 

10 

STA 

INPUT (L) 

1 

Lo  4—  1. 

11 

ENTA 

2 

1 

12 

STA 

INPUT+N+KL) 

1 

^iV+l  4—  2. 

13 

STZ 

INPUT+N-KL) 

1 

Ln-i  4—  0. 

H 

STZ 

INPUT+N (L) 

1 

Ln  •<—  0. 

15 

JMP 

L2 

1 

To  L2. 

16 

L3Q 

LDA 

INPUT, 2 

C"  + B' 

L3.  Compare  Kn : Kn . 

17 

L3P 

CMPA 

INPUT, 1 (KEY) 

C 

18 

JL 

L6 

c 

To  L6  if  Kq  < Kp. 

19 

L4 

ST1 

INPUT, 3 (ABSL) 

C' 

L4.  Advance  v.  1 LA  «—  v. 

20 

ENT3 

0,1 

C' 

s 1 — p. 

21 

LD1 

INPUT, 1(L) 

C' 

p 1 — Lp. 

22 

J1P 

L3P 

C' 

To  L3  if  p > 0. 
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23 

L5 

ST2  INPUT, 3 (L) 

B’ 

L5.  Complete  the  sublist.  L.  <—  a. 

24 

ENT3  0,4 

B' 

s t — t. 

25 

ENT4  0,2 

D' 

t <—  q. 

26 

LD2  INPUT, 2 (L) 

D' 

q «—  Lq. 

27 

J2P  *-2 

D' 

Repeat  if  q > 0. 

28 

JMP  L8 

B' 

To  L8. 

29 

L6 

ST2  INPUT , 3 (ABSL) 

C" 

L6.  Advance  a.  L. | q. 

30 

ENT3  0,2 

C" 

s t — q. 

31 

LD2  INPUT, 2 (L) 

c" 

q <—  Lq. 

32 

J2P  L3Q 

C" 

To  L3  if  q > 0. 

33 

L7 

ST1  INPUT, 3 (L) 

B" 

L7.  Complete  the  sublist.  L,  <—  v. 

34 

ENT 3 0,4 

B" 

s <—  t. 

35 

ENT4  0,1 

D" 

t <—  p. 

36 

LD1  INPUT, 1 (L) 

D" 

V Lp. 

37 

J1P  *-2 

D" 

Repeat  if  p > 0. 

38 

L8 

ENN1  0,1 

B 

L8.  End  of  pass?  v < p. 

39 

ENN2  0,2 

B 

q < — q- 

40 

J2NZ  L3Q 

B 

To  L3  if  q ^ 0. 

41 

ST1  INPUT, 3 (ABSL) 

A 

\LS\  p. 

42 

STZ  INPUT, 4 (ABSL) 

A 

\Lt\  <-0. 

43 

L2 

ENT3  0 

A+  1 

L2.  Begin  new  pass,  s <—  0. 

44 

ENT4  N+l 

-4  + 1 

t <r-  N+l. 

45 

LD1  INPUT (L) 

A+  1 

P Ls. 

46 

LD2  INPUT+N+KL) 

A + l 

q <-  Lt. 

47 

J2NZ  L3Q 

A+  1 

To  L3  if  q ^ 0.  | 

The  running  time  of  this  program  can  be  deduced  using  techniques  we  have 
seen  many  times  before  (see  exercises  13  and  14);  it  comes  to  approximately 
( 1 OiV lg N + 4.92N)u  on  the  average,  with  a small  standard  deviation  of  order 
VN.  Exercise  15  shows  that  the  running  time  can  in  fact  be  reduced  to  about 
(81V  lg  IV)  it,  at  the  expense  of  a substantially  longer  program. 

Thus  we  have  a clear  victory  for  linked-memory  techniques  over  sequential 
allocation,  when  internal  merging  is  being  done:  Less  memory  space  is  required, 
and  the  program  runs  about  10  to  20  percent  faster.  Similar  algorithms  have 
been  published  by  L.  J.  Woodrum  [IBM  Systems  J.  8 (1969),  189-203]  and 
A.  D.  Woodall  [Comp.  J.  13  (1970),  110-111], 

EXERCISES 

1.  [21]  Generalize  Algorithm  M to  a k-way  merge  of  the  input  files  xa  < • ■ • < xim 
for  i = 1,  2,  . . . , k. 

2.  [M24]  Assuming  that  each  of  the  (mJ]n)  possible  arrangements  of  m x’s  among 
71  y s is  equally  likely,  find  the  mean  and  standard  deviation  of  the  number  of  times 
step  M2  is  performed  during  Algorithm  M.  What  are  the  maximum  and  minimum 
values  of  this  quantity? 

► 3.  [20]  ( Updating . ) Given  records  It  \ , . . . , Rm  and  It] . . . . , R'N  whose  keys  are  dis- 
tinct and  in  order,  so  that  Ki  < ■ • • < Km  and  K[  < • • • < K'N,  show  how  to  modify 
Algorithm  M to  obtain  a merged  file  in  which  records  R,  of  the  first  file  have  been 
discarded  if  their  keys  appear  also  in  the  second  file. 
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4.  [21]  The  text  observes  that  merge  sorting  may  be  regarded  as  a generalization 
of  insertion  sorting.  Show  that  merge  sorting  is  also  strongly  related  to  tree  selection 
sorting  as  depicted  in  Fig.  23. 

► 5.  [21]  Prove  that  i can  never  be  equal  to  j in  steps  N6  or  N10.  (Therefore  it  is 
unnecessary  to  test  for  a possible  jump  to  N13  in  those  steps.) 

6.  [22]  Find  a permutation  AT  K2  ...  Kie  of  {1,  2, ...,  16}  such  that 

K2  > K,u  K.i  > K&,  K6  > K7,  Kg  > Kg,  Kw  < Klu  K12  < K13,  K14  < Kw, 

yet  Algorithm  N will  sort  the  file  in  only  two  passes.  (Since  there  are  eight  or  more 
runs,  we  would  expect  to  have  at  least  four  runs  after  the  first  pass,  two  runs  after 
the  second  pass,  and  sorting  would  ordinarily  not  be  complete  until  after  at  least  three 
passes.  How  can  we  get  by  with  only  two  passes?) 

7.  [16]  Give  a formula  for  the  exact  number  of  passes  required  by  Algorithm  S,  as  a 
function  of  N. 

8.  [22]  During  Algorithm  S,  the  variables  q and  r are  supposed  to  represent  the 
lengths  of  the  unmerged  elements  in  the  runs  currently  being  processed;  q and  r both 
start  out  equal  to  p,  while  the  runs  are  not  always  this  long.  How  can  this  possibly 
work? 

9.  [24]  Write  a MIX  program  for  Algorithm  S.  Specify  the  instruction  frequencies  in 
terms  of  quantities  analogous  to  A,  B' , B",  C' , . . . in  Program  L. 

10.  [25]  (D.  A.  Bell.)  Show  that  sequentially  allocated  straight  two-way  merging  can 
be  done  with  at  most  ~N  memory  locations,  instead  of  2 N as  in  Algorithm  S. 

11.  [21]  Is  Algorithm  L a stable  sorting  method? 

► 12.  [22]  Revise  step  LI  of  Algorithm  L so  that  the  two-way  merge  is  “natural,”  taking 
advantage  of  ascending  runs  that  are  initially  present.  (In  particular,  if  the  input  is 
already  sorted,  step  L2  should  terminate  the  algorithm  immediately  after  your  step  LI 
has  acted.) 

► 13.  [M3 4 ] Give  an  analysis  of  the  average  running  time  of  Program  L,  in  the  style 
of  other  analyses  in  this  chapter:  Interpret  the  quantities  A,B,B' , and  explain 
how  to  compute  their  exact  average  values.  How  long  does  Program  L take  to  sort  the 
16  numbers  in  Table  3? 

14.  [M24]  Let  the  binary  representation  of  N be  2ei  +2e2  +•  • - + 2e‘,  where  ei  > e2  > 
■ • ■ > et  > 0,  t > 1.  Prove  that  the  maximum  number  of  key  comparisons  performed 
by  Algorithm  L is  1 - 2e*  + YX= i(ek  + & - l)2efc. 

15.  [20]  Hand  simulation  of  Algorithm  L reveals  that  it  occasionally  does  redundant 
operations;  the  assignments  |LS|  <—  p,  [Ls[  <—  q in  steps  L4  and  L6  are  unnecessary 
about  half  of  the  time,  since  we  have  Ls  = p (or  q)  each  time  step  L4  (or  L6)  returns 
to  L3.  How  can  Program  L be  improved  so  that  this  redundancy  disappears? 

16.  [28]  Design  a list  merging  algorithm  like  Algorithm  L but  based  on  three-way 
merging. 

17.  [20]  (J.  McCarthy.)  Let  the  binary  representation  of  N be  as  in  exercise  14,  and 
assume  that  we  are  given  N records  arranged  in  t ordered  subfiles  of  respective  sizes 
2ei,  2®2, . . . , 2et.  Show  how  to  maintain  this  state  of  affairs  when  a new  (N  + l)st  record 
is  added  and  N <—  N + l.  (The  resulting  algorithm  may  be  called  an  online  merge  sort.) 
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Fig.  31.  A railway  network  with  five  “stacks.” 


18.  [40]  (M.  A.  Kronrod.)  Given  a file  of  N records  containing  only  two  runs, 

A'i  < • • • < Km  and  Km+  1 < ■ ■ ■ < Kn , 

is  it  possible  to  sort  the  file  with  O(N)  operations  in  a random-access  memory,  using 
only  a small  fixed  amount  of  additional  memory  space  regardless  of  the  sizes  of  M 
and  N7  (All  of  the  merging  algorithms  described  in  this  section  make  use  of  extra 
memory  space  proportional  to  N.) 

19.  [26]  Consider  a railway  switching  network  with  n “stacks,”  as  shown  in  Fig.  31 
when  n = 5;  we  considered  one-stack  networks  in  exercises  2. 2. 1-2  through  2,2.1  5.  If 
N railroad  cars  enter  at  the  right,  we  observed  that  only  comparatively  few  of  the  N\ 
permutations  of  those  cars  could  appear  at  the  left,  in  the  one-stack  case. 

In  the  n-stack  network,  assume  that  2n  cars  enter  at  the  right.  Prove  that  each 
of  the  2n!  possible  permutations  of  these  cars  is  achievable  at  the  left,  by  a suitable 
sequence  of  operations.  (Each  stack  is  actually  much  bigger  than  indicated  in  the 
illustration  — big  enough  to  accommodate  all  the  cars,  if  necessary.) 

20.  [47]  In  the  notation  of  exercise  2.2. 1-4,  at  most  a£  permutations  of  N elements 
can  be  produced  with  an  n-stack  railway  network;  hence  the  number  of  stacks  needed 
to  obtain  all  N ! permutations  is  at  least  log  N\/ log  aN  « log4  N.  Exercise  19  shows 
that  at  most  [lg  N]  stacks  are  needed.  What  is  the  true  rate  of  growth  of  the  necessary 
number  of  stacks,  as  N — > 00? 

21.  [23]  (A.  J.  Smith.)  Explain  how  to  extend  Algorithm  L so  that,  in  addition  to 
sorting,  it  computes  the  number  of  inversions  present  in  the  input  permutation. 

22.  [28]  (J.  K.  R.  Barnett.)  Develop  a way  to  speed  up  merge  sorting  on  multiword 
keys.  (Exercise  5.2.2-30  considers  the  analogous  problem  for  quicksort.) 

23.  [M30]  Exercises  13  and  14  analyze  a “bottom-up”  or  iterative  version  of  merge 
sort,  where  the  cost  c(N)  of  sorting  N items  satisfies  the  recurrence 

c(N)  = c(2k)  + c{N  - 2k)  + f{2k,N  — 2k)  for  2k  < N < 2fe+1 

and  f(m,n)  is  the  cost  of  merging  m things  with  n.  Study  the  “top-down”  or  divide- 
and-conquer  recurrence 

c(N)  = c(  r NJ 21 ) + c(  [N/2] ) + /( r N/ 2] , |A/2J ) for  N > 1, 
which  arises  when  merge  sort  is  programmed  recursively. 

5.2.5.  Sorting  by  Distribution 

We  come  now  to  an  interesting  class  of  sorting  methods  that  are  essentially  the 
exact  opposite  of  merging,  when  considered  from  a standpoint  we  shall  discuss 
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in  Section  5.4.7.  These  methods  were  used  to  sort  punched  cards  for  many  years, 
long  before  electronic  computers  existed.  The  same  approach  can  be  adapted  to 
computer  programming,  and  it  is  generally  known  as  “bucket  sorting,”  “radix 
sorting,”  or  “digital  sorting,”  because  it  is  based  on  the  digits  of  the  keys. 

Suppose  we  want  to  sort  a 52-card  deck  of  playing  cards.  We  may  define 

A<2<3<4<5<6<7<8<9<  10  <J<Q<K, 
as  an  ordering  of  the  face  values,  and  for  the  suits  we  may  define 

* < 0 < V < * 

One  card  is  to  precede  another  if  either  (i)  its  suit  is  less  than  the  other  suit,  or 
(ii)  its  suit  equals  the  other  suit  but  its  face  value  is  less.  (This  is  a particular 
case  of  lexicographic  ordering  between  ordered  pairs  of  objects;  see  exercise  5-2.) 
Thus 

A*<2*<>--<K*<AO<’-’<Q4KK*. 

We  could  sort  the  cards  by  any  of  the  methods  already  discussed.  Card 
players  often  use  a technique  somewhat  analogous  to  the  idea  behind  radix 
exchange:  First  they  divide  the  cards  into  four  piles,  according  to  suit,  then 
they  fiddle  with  each  individual  pile  until  everything  is  in  order. 

But  there  is  a faster  way  to  do  the  trick!  First  deal  the  cards  face  up  into 
13  piles,  one  for  each  face  value.  Then  collect  these  piles  by  putting  the  aces 
on  the  bottom,  the  2s  face  up  on  top  of  them,  then  the  3s,  etc.,  finally  putting 
the  kings  (face  up)  on  top.  Turn  the  deck  face  down  and  deal  again,  this  time 
into  four  piles  for  the  four  suits.  (Again  you  turn  the  cards  face  up  as  you  deal 
them.)  By  putting  the  resulting  piles  together,  with  clubs  on  the  bottom,  then 
diamonds,  hearts,  and  spades,  you’ll  get  the  deck  in  perfect  order. 

The  same  idea  applies  to  the  sorting  of  numbers  and  alphabetic  data.  Why 
does  it  work?  Because  (in  our  playing  card  example)  if  two  cards  go  into  different 
piles  in  the  final  deal,  they  have  different  suits,  so  the  one  with  the  lower  suit  is 
lowest.  But  if  two  cards  have  the  same  suit  (and  consequently  go  into  the  same 
pile),  they  are  already  in  proper  order  because  of  the  previous  sorting.  In  other 
words,  the  face  values  will  be  in  increasing  order,  on  each  of  the  four  piles,  as  we 
deal  the  cards  on  the  second  pass.  The  same  proof  can  be  abstracted  to  show 
that  any  lexicographic  ordering  can  be  sorted  in  this  way;  for  details,  see  the 
answer  to  exercise  5-2,  at  the  beginning  of  this  chapter. 

The  sorting  method  just  described  is  not  immediately  obvious,  and  it  isn’t 
clear  who  first  discovered  the  fact  that  it  works  so  conveniently.  A 19-page 
pamphlet  entitled  “The  Inventory  Simplified,”  published  by  the  Tabulating  Ma- 
chines Company  division  of  IBM  in  1923,  presented  an  interesting  Digit  Plan 
method  for  forming  sums  of  products  on  their  Electric  Sorting  Machine:  Suppose, 
for  example,  that  we  want  to  multiply  the  number  punched  in  columns  1-10 
by  the  number  punched  in  columns  23-25,  and  to  sum  all  of  these  products 
for  a large  number  of  cards.  We  can  sort  first  on  column  25,  then  use  the 
Tabulating  Machine  to  find  the  quantities  aj,  02, . . . , ag,  where  a /.  is  the  total 
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of  columns  1 10  summed  over  all  cards  having  k in  column  25.  Then  we  can 
sort  on  column  24,  finding  the  analogous  totals  b±,  b2, . . . , 69;  also  on  column  23. 
obtaining  C|.c2 e9.  The  desired  sum  of  products  is  easily  seen  to  be 

®i  + 2a2  + • • • + 9ag  + 106j  + 20  &2  + • • • + 905g  + 100ci  + 200c2  + •••-(-  900cg. 

This  punched-card  tabulating  method  leads  naturally  to  the  discovery  of  least- 
significant-digit-first  radix  sorting,  so  it  probably  became  known  to  the  machine 
operators.  The  first  published  reference  to  this  principle  for  sorting  appears  in 

L.  J.  Comrie’s  early  discussion  of  punched-card  equipment  [Transactions  of  the 
Office  Machinery  Users’  Assoc.,  Ltd.  (1929),  25-37,  especially  page  28]. 

In  order  to  handle  radix  sorting  inside  a computer,  we  must  decide  what  to 
do  with  the  piles.  Suppose  that  there  are  M piles;  we  could  set  aside  M areas  of 
memory,  moving  each  record  from  an  input  area  into  its  appropriate  pile  area. 
But  this  is  unsatisfactory,  since  each  area  must  be  large  enough  to  hold  N items, 
and  (M  + 1 )N  record  spaces  would  be  required.  Therefore  most  people  rejected 
the  idea  of  radix  sorting  within  a computer,  until  H.  H.  Seward  [Master’s  thesis, 

M. I.T.  Digital  Computer  Laboratory  Report  R-232  (1954),  25-28]  pointed  out 
that  we  can  achieve  the  same  effect  with  only  2 N record  areas  and  M count  fields. 
We  simply  count  how  many  elements  will  lie  in  each  of  the  M piles,  by  making 
a preliminary  pass  over  the  data;  this  tells  us  precisely  how  to  allocate  memory 
for  the  piles.  We  have  already  made  use  of  the  same  idea  in  the  “distribution 
counting  sort,”  Algorithm  5. 2D. 

Thus  radix  sorting  can  be  carried  out  as  follows:  Start  with  a distribution 
sort  based  on  the  least  significant  digit  of  the  keys  (in  radix  M notation),  moving 
records  from  the  input  area  to  an  auxiliary  area.  Then  do  another  distribution 
sort,  on  the  next  least  significant  digit,  moving  the  records  back  into  the  original 
input  area;  and  so  on,  until  the  final  pass  (on  the  most  significant  digit)  puts  all 
records  into  the  desired  order. 

If  we  have  a decimal  computer  with  12-digit  keys,  and  if  N is  rather  large,  we 
can  choose  M — 1000  (considering  three  decimal  digits  as  one  radix-1000  digit); 
then  sorting  will  be  complete  in  four  passes,  regardless  of  the  size  of  N.  Similarly, 
if  we  have  a binary  computer  and  a 40-bit  key,  we  can  set  M = 1024  = 210  and 
complete  the  sorting  in  four  passes.  Actually  each  pass  consists  of  three  parts 
(counting,  allocating,  moving);  E.  H.  Friend  [JACM  3 (1956),  151]  suggested 
combining  two  of  those  parts  at  the  expense  of  M more  memory  locations,  by 
accumulating  the  counts  for  pass  k + 1 while  moving  the  records  on  pass  k. 

Table  1 shows  how  such  a radix  sort  can  be  applied  to  our  16  example 
numbers,  with  M = 10.  Radix  sorting  is  generally  not  useful  for  such  small  N. 
so  a small  example  like  this  is  intended  to  illustrate  the  sufficiency  rather  than 
the  efficiency  of  the  method. 

An  alert,  “modern”  reader  will  note,  however,  that  the  whole  idea  of  mak- 
ing digit  counts  for  the  storage  allocation  is  tied  to  old-fashioned  ideas  about 
sequential  data  representation.  We  know  that  linked  allocation  is  specifically 
designed  to  handle  a set  of  tables  of  variable  size,  so  it  is  natural  to  choose  a 
linked  data  structure  for  radix  sorting.  Since  we  traverse  each  pile  serially,  all 
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Table  1 

RADIX  SORTING 


Input  area  contents:  503  087  512  061 

Counts  for  units  digit  distribution: 

Storage  allocations  based  on  these  counts: 
Auxiliary  area  contents:  170  061  512  612 
Counts  for  tens  digit  distribution: 

Storage  allocations  based  on  these  counts: 
Input  area  contents:  503  703  908  509 

Counts  for  hundreds  digit  distribution: 
Storage  allocations  based  on  these  counts: 
Auxiliary  area  contents:  061  087  154  170 


908  170  897  275  653  426  154  509  612  677  765  703 
1123121311 

1 2 4 7 8 10  11  14  15  16 

503  653  703  154  275  765  426  087  897  677  908  509 

4210022311 

4 6 7 7 7 9 11  14  15  16 

512  612  426  653  154  061  765  170  275  677  087  897 

2210133211 

2 4 5 5 6 9 12  14  15  16 

275  426  503  509  512  612  653  677  703  765  897  908 


we  need  is  a single  link  from  each  item  to  its  successor.  Furthermore,  we  never 
need  to  move  the  records;  we  merely  adjust  the  links  and  proceed  merrily  down 
the  lists.  The  amount  of  memory  required  is  (1  + e)N  + 2 eM  records,  where  e 
is  the  amount  of  space  taken  up  by  a link  field.  Formal  details  of  this  procedure 
are  rather  interesting  since  they  furnish  an  excellent  example  of  typical  data 
structure  manipulations,  combining  sequential  and  linked  allocation: 

Algorithm  R ( Radix  list  sort).  Records  Rx, . . . , are  each  assumed  to  contain 
a LINK  held.  Their  keys  are  assumed  to  be  p-tuples 

(ai,a2, . . . ,ap),  0 < a*  < M,  (l) 

where  the  order  is  defined  lexicographically  so  that 

)oX  , 0,2 , • ■ - , Cip  ) (&1 , * j bp  ) (2 ) 

if  and  only  if  for  some  j,  1 < j < p,  we  have 

Oi  — bi  for  all  i < j,  but  aj  < bj-  (3) 

The  keys  may,  in  particular,  be  thought  of  as  numbers  written  in  radix  M 

notation, 

axMv  ' + 02Mp  • ■ • + Op—XM  + ap,  (4) 

and  in  this  case  lexicographic  order  corresponds  to  the  normal  ordering  of  non- 
negative numbers.  The  keys  may  also  be  strings  of  alphabetic  letters,  etc. 

Sorting  is  done  by  keeping  M “piles”  of  records,  in  a manner  that  exactly 
parallels  the  action  of  a card  sorting  machine.  The  piles  are  really  queues  in  the 
sense  of  Chapter  2,  since  we  link  them  together  so  that  they  are  traversed  in  a 
hrst-in-hrst-out  manner.  There  are  two  pointer  variables  TOP  [7]  and  BOTMfi] 
for  each  pile,  0 < i < M,  and  we  assume  as  in  Chapter  2 that 


LINK  (L0C  (B0TM  [i]  ) ) = BOTM  [j]  . 


(5) 
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Fig.  32.  Radix  list  sort. 


Rl.  [Loop  on  k.]  In  the  beginning,  set  P 4-  LOCKRjvR  a pointer  to  the  last 
record.  Then  perform  steps  R2  through  R6  for  k = 1,  2,  . . . , p.  (Steps  R2 
through  R6  constitute  one  “pass.”)  Then  the  algorithm  terminates,  with  P 
pointing  to  the  record  with  the  smallest  key,  LINK(P)  to  the  record  with  next 
smallest,  then  LINK(LINK(P) ) , etc.;  the  LINK  in  the  final  record  will  be  A. 

R2.  [Set  piles  empty.]  Set  T0P[i]  4-  L0C(B0TM[i])  and  BOTM  [i]  4-  A.  for 
0 < i < M. 

R3.  [Extract  kth  digit  of  key.]  Let  KEY  (P) , the  key  in  the  record  referenced  by  P. 
be  (ai,  a2, . . . , ap)\  set  i 4—  ap+1_fc,  the  fcth  least  significant  digit  of  this  key. 

R4.  [Adjust  links.]  Set  LINK(T0P[i])  4-  P,  then  set  T0P[i]  4-  P. 

R5.  [Step  to  next  record.]  If  k = 1 (the  first  pass)  and  if  P = L0C( Rj) , for  some 
j ^ 1,  set  P 4—  L0C( Rj~i)  and  return  to  R3.  If  k > 1 (subsequent  passes), 
set  P 4—  LINK(P) , and  return  to  R3  if  P /:  A. 

R6.  [Do  Algorithm  H.]  (We  are  now  done  distributing  all  elements  onto  the 
piles.)  Perform  Algorithm  H below,  which  “hooks  together”  the  individual 
piles  into  one  list,  in  preparation  for  the  next  pass.  Then  set  P 4-  B0TM[0] . 
a pointer  to  the  first  element  of  the  hooked-up  list.  (See  exercise  3.)  | 

Algorithm  H ( Hooking-up  of  queues).  Given  M queues,  linked  according  to 

the  conventions  of  Algorithm  R,  this  algorithm  adjusts  at  most  M links  so  that 

a single  queue  is  created,  with  BOTM  [0]  pointing  to  the  first  element,  and  with 

pile  0 preceding  pile  1 . . . preceding  pile  M — l. 

HI.  [Initialize.]  Set  *4—0. 

H2.  [Point  to  top  of  pile.]  Set  P 4—  TOP  [*] . 

H3.  [Next  pile.]  Increase  ?'  by  1.  If  i — M,  set  LINK(P)  4—  A and  terminate  the 
algorithm. 

H4.  [Is  pile  empty?]  If  BOTM  [*]  = A,  go  back  to  H3. 

H5.  [Tie  piles  together.]  Set  LINK(P)  4-  BOTM  [?']  . Return  to  H2.  | 
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Figure  33  shows  the  contents  of  the  piles  after  each  of  the  three  passes,  when 
our  16  example  numbers  are  sorted  with  M = 10.  Algorithm  R is  very  easy  to 
program  for  MIX,  once  a suitable  way  to  treat  the  pass-by-pass  variation  of  steps 
R3  and  R5  has  been  found.  The  following  program  does  this  without  sacrificing 
any  speed  in  the  inner  loop,  by  overlaying  two  of  the  instructions.  Note  that 
TOP  [i]  and  BOTM  [?.]  can  be  packed  into  the  same  word. 


TOP [0]  TOP [1]  TOP [2]  TOP [3]  TOP [4]  TOP [5]  TOP [6]  TOP [7]  TOP [8]  TOP [9] 


B0TMC0]  B0TM[1]  BOTM  [2]  BOTM  [3]  BOTM  [4]  BOTM  [5]  BOTM  [6]  BOTM  [7]  BOTM  [8]  BOTM  [9] 


TOP  [0]  TOP  [1]  TOP  [2]  TOP  [3]  TOP  [4]  TOP  [5]  TOP  [6]  TOP  [7]  TOP  [8]  TOP  [9] 


BOTM [0]  BOTM [1]  BOTM [2]  BOTM [3]  BOTM [4]  BOTM [5]  BOTM [6]  BOTM [7]  BOTM [8]  BOTM [9] 


J_  X 


TOP [0]  TOP [1]  TOP [2]  TOP [3]  TOP [4]  TOP [5]  TOP [6]  TOP [7]  TOP [8]  TOP [9] 


BOTMCO]  BOTM  [1]  B0TM[2]  B0TM[3]  B0TM[4]  B0TM[5]  BOTM  [6]  B0TM[7]  B0TM[8]  B0TM[9] 


Fig.  33.  Radix  sort  using  linked  allocation:  contents  of  the  ten  piles  after  each  pass. 

Program  R ( Radix  list  sort).  The  given  records  in  locations  INPUT+1  through 
INPUT+N  are  assumed  to  have  p — 3 components  (01,02,(13)  stored  respectively 
in  the  (1:1),  (2:2),  and  (3:3)  fields.  (Thus  M is  assumed  to  be  less  than  or 
equal  to  the  byte  size  of  MIX.)  The  (4:5)  field  of  each  record  is  its  LINK.  We 
let  TOP [i]  x PILES  + 7(1:2)  and  BOTM [1]  = PILES  + 1(4:5),  for  0 < i < M.  It 
is  convenient  to  make  links  relative  to  location  INPUT,  so  that  L0C(B0TM[i])  = 
PILES + 1 — INPUT;  to  avoid  negative  links  we  therefore  want  the  PILES  table  to  be 
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in  higher  locations  than  the  INPUT  table.  Index  registers  are  assigned  as  follows: 
rll  = P,  rI2  = i,  rI3  = 3 — k,  rI4  = T0P[i] ; during  Algorithm  H,  rI2  = i — M. 


01 

LINK 

EQU 

4:5 

02 

TOP 

EQU 

1:2 

03 

START 

ENT1 

N 

1 

Rl.  Loop  on  k.  P = LOC(Rw). 

04 

ENT3 

2 

1 

k +T  1. 

05 

2H 

ENT2 

M-l 

3 

R2.  Set  piles  empty ■ 

06 

ENTA 

PILES-INPUT , 2 

3 M 

L0C(B0TM  [i]  ) 

07 

STA 

PILES, 2 (TOP) 

3 M 

->  TOP  [i] . 

08 

STZ 

PILES, 2 (LINK) 

3M 

B0TM[i]  +-  A. 

09 

DEC2 

1 

3 M 

10 

J2NN 

*-4 

3 M 

M > i > 0. 

11 

LDA 

R3SW,3 

3 

12 

STA 

3F 

3 

Modify  instructions  for  pass  k. 

13 

LDA 

R5SW , 3 

3 

U 

STA 

5F 

3 

15 

3H 

[LD2 

INPUT, 1(3: 3)] 

R3.  Extract  kth  disit  of  key. 

16 

4H 

LD4 

PILES, 2 (TOP) 

3 N 

R4.  Adjust  links. 

17 

ST1 

INPUT, 4 (LINK) 

3 N 

LINK  (TOP  [i]  ) +-  P. 

18 

ST1 

PILES, 2 (TOP) 

3 N 

TOP  [i]  <-  P. 

19 

5H 

[DEC1 

1] 

R5.  Step  to  next  record. 

20 

J1NZ 

3B 

3N 

To  R3  if  end  of  pass. 

21 

6H 

ENN2 

M 

3 

R6.  Do  Alsorithm  H. 

22 

JMP 

7F 

3 

To  H2  with  it—  0. 

23 

R3SW 

LD2 

INPUT, 1(1:1) 

N 

Instruction  for  R3  when  k = 3. 

24 

LD2 

INPUT, 1(2: 2) 

N 

Instruction  for  R3  when  k = 2. 

25 

LD2 

INPUT, 1(3:3) 

N 

Instruction  for  R3  when  k = 1. 

26 

R5SW 

LD1 

INPUT, 1 (LINK) 

N 

Instruction  for  R5  when  k — 3. 

27 

LD1 

INPUT, 1 (LINK) 

N 

Instruction  for  R5  when  k = 2. 

28 

DEC1 

1 

N 

Instruction  for  R5  when  k = 1. 

29 

9H 

LDA 

PILES+M , 2 (LINK) 

3M-3 

H4.  Is  pile  empty? 

30 

JAZ 

8F 

3M-3 

To  H3  if  BOTM  [i]  = A. 

31 

STA 

INPUT, 1 (LINK) 

3M-3-E 

115.  Tie  piles  tosether. 

32 

7H 

LD1 

PILES+M, 2 (TOP) 

3M  - E 

H2.  Point  to  top  of  pile. 

33 

8H 

INC2 

1 

3 M 

H3.  Next  pile,  i t—  i + 1. 

34 

J2NZ 

9B 

3 M 

To  HI  if  i / M. 

35 

STZ 

INPUT, 1 (LINK) 

3 

LINK(P)  <-  A. 

36 

LD1 

PILES (LINK) 

3 

P «-  BOTM  [0]  . 

37 

DEC3 

1 

3 

38 

J3NN 

2B 

3 

Loop  for  1 < k < 3.  | 

The  running  time  of  Program  R is  32 N + 48M  + 38  — 4E,  where  N is  the 
number  of  input  records,  M is  the  radix  (the  number  of  piles),  and  E is  the 
number  of  occurrences  of  empty  piles.  This  compares  very  favorably  with  other 
programs  we  have  constructed  based  on  similar  assumptions  (Programs  5.2.1M, 
5.2.4L).  A p-pass  version  of  the  program  would  take  (lip  — 1 )N  + 0(pM)  units 
of  time;  the  critical  factor  in  the  timing  is  the  inner  loop,  which  involves  five 
references  to  memory  and  one  branch.  On  a typical  computer  we  will  have 
M — br  and  p = [ where  t is  the  number  of  radix-6  digits  in  the  keys; 
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increasing  r will  decrease  p,  so  the  formulas  can  be  used  to  determine  a best 
value  of  r. 

The  only  variable  in  the  timing  is  E,  the  number  of  empty  piles  observed 
in  step  H4.  If  we  consider  each  of  the  MN  sequences  of  radix- M digits  to  be 
equally  probable,  we  know  from  our  study  of  the  “poker  test”  in  Section  3. 3. 2D 
that  there  are  M - r empty  piles  with  probability 

M(M  — 1)  ...  (Af-r+1)  f IV 1 

MN  \ r J ^6) 

on  each  pass,  where  {^}  is  a Stirling  number  of  the  second  kind.  By  exercise  6, 

£=(min  max(M  — N , 0)p,  ave  p , max  (M-l)pV  (7) 

An  ever-increasing  number  of  “pipeline”  or  “number-crunching”  computers 
have  appeared  in  recent  years.  These  machines  have  multiple  arithmetic  units 
and  look-ahead  circuitry  so  that  memory  references  and  computation  can  be 
highly  overlapped;  but  their  efficiency  deteriorates  noticeably  in  the  presence  of 
conditional  branch  instructions  unless  the  branch  almost  always  goes  the  same 
way.  The  inner  loop  of  a radix  sort  is  well  adapted  to  such  machines,  because 
it  is  a straight  iterative  calculation  of  typical  number-crunching  form.  Therefore 
radix  sorting  is  usually  more  efficient  than  any  other  known  method  for  internal 
sorting  on  such  machines , provided  that  N is  not  too  small  and  the  keys  are  not 
too  long. 

Of  course,  radix  sorting  is  not  very  efficient  when  the  keys  are  extremely 
long.  For  example,  imagine  sorting  60-digit  decimal  numbers  with  20  passes  of  a 
radix  sort,  using  M = 103;  very  few  pairs  of  numbers  will  tend  to  have  identical 
keys  in  their  leading  9 digits,  so  the  first  17  passes  accomplish  very  little.  In  our 
analysis  of  radix  exchange  sorting,  we  found  that  it  was  unnecessary  to  inspect 
many  bits  of  the  key,  when  we  looked  at  the  keys  from  the  left  instead  of  the 
right.  Let  us  therefore  reconsider  the  idea  of  a radix  sort  that  starts  at  the  most 
significant  digit  (MSD)  instead  of  the  least  significant  digit  (LSD). 

We  have  already  remarked  that  an  MSD-first  radix  method  suggests  itself 
naturally;  in  fact,  it  is  not  hard  to  see  why  the  post  office  uses  such  a method 
to  sort  mail.  A large  collection  of  letters  can  be  sorted  into  separate  bags  for 
different  geographical  areas;  each  of  these  bags  then  contains  a smaller  number 
of  letters  that  can  be  sorted  independently  of  the  other  bags,  into  finer  and 
finer  geographical  divisions.  (Indeed,  bags  of  letters  can  be  transported  nearer 
to  their  destinations  before  they  are  sorted  further,  or  as  they  are  being  sorted 
further.)  This  principle  of  “divide  and  conquer”  is  quite  appealing,  and  the 
only  reason  it  doesn’t  work  especially  well  for  sorting  punched  cards  is  that  it 
ultimately  spends  too  much  time  fussing  with  very  small  piles.  Algorithm  R is 
relatively  efficient,  even  though  it  considers  LSD  first,  since  we  never  have  more 
than  M piles,  and  the  piles  need  to  be  hooked  together  only  p times.  On  the 
other  hand,  it  is  not  difficult  to  design  an  MSD-first  radix  method  using  linked 
memory,  with  negative  links  as  in  Algorithm  5.2.4L  to  denote  the  boundaries 
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between  piles.  (See  exercise  10.)  The  main  difficulty  is  that  empty  piles  tend  to 
proliferate  and  to  consume  a great  deal  of  time  in  an  MSD-first  method. 

Perhaps  the  best  compromise  has  been  suggested  by  M.  D.  MacLaren  [JACM 
13  (1966),  404-411],  who  recommends  an  LSD-first  sort  as  in  Algorithm  R,  but 
applied  only  to  the  most  significant  digits.  This  does  not  completely  sort  the  file, 
but  it  usually  brings  the  file  very  nearly  into  order  so  that  very  few  inversions 
remain;  therefore  straight  insertion  can  be  used  to  finish  up.  Our  analysis  of 
Program  5.2. 1M  applies  also  to  this  situation,  so  that  if  the  keys  are  uniformly 
distributed  we  will  have  an  average  of  ^N(N  — 1 )M~P  inversions  remaining  in 
the  file  after  sorting  on  the  leading  p digits.  (See  Eq.  5.2.1-(i7)  and  exercise 
5.2.1-38.)  MacLaren  has  computed  the  average  number  of  memory  references 
per  item  sorted,  and  the  optimum  choice  of  M and  p (assuming  that  M is 
a power  of  2,  that  the  keys  are  uniformly  distributed,  and  that  N/Mp  <0.1 
so  that  deviations  from  uniformity  are  tolerable)  turns  out  to  be  given  by  the 


following  table: 

N = 100 

1000 

10000 

100000 

1000000 

107 

108 

109 

best  M = 32 

128 

512 

1024 

8192 

215 

217 

219 

best  p — 2 

2 

2 

2 

2 

2 

2 

2 

P(N)  = 19.3 

18.5 

18.2 

18.1 

18.0 

18.0 

18.0 

18.0 

Here  fi(N)  denotes  the  average  number  of  memory  references  per  item  sorted. 


/3(N)  = 5p  + 8 + 


2pM 

N 


N - 1 
2 Mp 


Hn . 
N ’ 


(8) 


it  is  bounded  as  N — > oo,  if  we  take  p = 2 and  M > \/N,  so  the  average  sorting 
time  is  actually  O(N)  instead  of  order  A log  IV.  This  method  is  an  improvement 
over  multiple  list  insertion  (Program  5.2.1M),  which  is  essentially  the  case  p—  1. 
Exercise  12  gives  MacLaren’s  interesting  procedure  for  final  rearrangement  of  a 
partially  list-sorted  file. 

It  is  also  possible  to  avoid  the  link  fields,  using  the  methods  of  Algo- 
rithm 5. 2D  and  exercise  5.2-13,  so  that  only  0(\/]V)  memory  locations  are 
needed  in  addition  to  the  space  required  for  the  records  themselves.  The  average 
sorting  time  is  proportional  to  N if  the  input  records  are  uniformly  distributed. 

W.  Dobosiewicz  obtained  good  results  by  using  an  MSD-first  distribution 
sort  until  reaching  short  subfiles,  with  the  distribution  process  constrained  so 
that  the  first  M/2  piles  were  guaranteed  to  receive  between  25%  and  75%  of  the 
records  [see  Inf.  Proc.  Letters  7 (1978),  1-6;  8 (1979),  170-172];  this  ensured 
that  the  average  time  to  sort  uniform  keys  would  be  O(N)  while  the  worst  case 
would  be  0(N  log  N).  His  papers  inspired  several  other  researchers  to  devise 
new  address  calculation  algorithms,  of  which  the  most  instructive  is  perhaps  the 
following  2-level  scheme  due  to  Markku  Tamminen  [J.  Algorithms  6 (1985),  138- 
144]:  Assume  that  all  keys  are  fractions  in  the  interval  [0. . 1).  First  distribute 
the  N records  into  [IV/ 8J  bins  by  mapping  key  K into  bin  [KN/ 8j . Then  suppose 
bin  k has  received  TV*,  records;  if  N k < 16,  sort  it  by  straight  insertion,  otherwise 
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sort  it  by  a MacLaren-like  distribution-plus-insertion  sort  into  M2  bins,  where 
M2  ss  lOA^fc.  Tamminen  proved  the  following  remarkable  result: 

Theorem  T.  There  is  a constant  T such  that  the  sorting  method  just  de- 
scribed performs  at  most  TN  operations  on  the  average , whenever  the  keys 
are  independent  random  numbers  whose  density  function  f(x)  is  bounded  and 
Riemann-integrable  for  0 < x < 1.  (The  constant  T does  not  depend  on  /.) 

Proof.  See  exercise  18.  Intuitively,  the  first  distribution  into  N/8  piles  finds 
intervals  in  which  / is  approximately  constant;  the  second  distribution  will  then 
make  the  expected  bin  size  approximately  constant.  | 

Several  versions  of  radix  sort  that  have  been  well  tuned  for  sorting  large 
arrays  of  alphabetic  strings  are  described  in  an  instructive  article  by  P.  M. 
Mcllroy,  K.  Bostic,  and  M.  D.  Mcllroy,  Computing  Systems  6 (1993),  5-27. 

EXERCISES 

► 1.  [20]  The  algorithm  of  exercise  5.2-13  shows  how  to  do  a distribution  sort  with 
only  N record  areas  (and  M count  fields),  instead  of  2 N record  areas.  Does  this  lead 
to  an  improvement  over  the  radix  sorting  algorithm  illustrated  in  Table  1? 

2.  [13]  Is  Algorithm  R a stable  sorting  method? 

3.  [15]  Explain  why  Algorithm  H makes  B0TM[0]  point  to  the  first  record  in  the 
"hooked-up”  queue,  even  though  pile  0 might  he  empty. 

► 4.  [23]  Algorithm  R keeps  the  M piles  linked  together  as  queues  (first-in-first-out). 
Explore  the  idea  of  linking  the  piles  as  stacks  instead.  (The  arrows  in  Fig.  33  would 
go  downward  instead  of  upward,  and  the  BOTM  table  would  be  unnecessary.)  Show  that 
if  the  piles  are  “hooked  together”  in  an  appropriate  order,  it  is  possible  to  achieve  a 
valid  sorting  method.  Does  this  lead  to  a simpler  or  a faster  algorithm? 

5.  [20]  What  changes  are  necessary  to  Program  R so  that  it  sorts  eight-byte  keys 
instead  of  three-byte  keys?  Assume  that  the  most  significant  bytes  of  Kt  are  stored  in 
location  KEY+i  (1 : 5),  while  the  three  least  significant  bytes  are  in  location  INPUT+i  (1 :3) 
as  presently.  What  is  the  running  time  of  the  program,  after  these  changes  have  been 
made? 

6.  [M24]  Let  g m n (z)  = J2PMNkZk , where  p\iNk  is  the  probability  that  exactly  k 
empty  piles  are  present  after  a random  radix-sort  pass  puts  N elements  into  M piles. 

a)  Show  that  gM(N+i)(z)  = qmn^z)  + ((1  - z)/M) g'MN{z). 

b)  Use  this  relation  to  find  simple  expressions  for  the  mean  and  variance  of  this 
probability  distribution,  as  a function  of  M and  N. 

7.  [20]  Discuss  the  similarities  and  differences  between  Algorithm  R and  radix  ex- 
change sorting  (Algorithm  5.2.2R). 

► 8.  [20]  The  radix-sorting  algorithms  discussed  in  the  text  assume  that  all  keys  being 
sorted  are  nonnegative.  What  changes  should  be  made  to  the  algorithms  when  the  keys 
are  numbers  expressed  in  two’s  complement  or  ones’  complement  notation? 

9.  [20]  Continuing  exercise  8,  what  changes  should  be  made  to  the  algorithms  when 
the  keys  are  numbers  expressed  in  signed  magnitude  notation? 
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10.  [30]  Design  an  efficient  most-significant-digit-first  radix-sorting  algorithm  that 
uses  linked  memory.  (As  the  size  of  the  subfiles  decreases,  it  is  wise  to  decrease  M,  and 
to  use  a nonradix  method  on  the  really  short  subfiles.) 

11.  [16]  The  sixteen  input  numbers  shown  in  Table  1 start  with  41  inversions;  after 
sorting  is  complete,  of  course,  there  are  no  inversions  remaining.  How  many  inversions 
would  be  present  in  the  file  if  we  omitted  pass  1,  doing  a radix  sort  only  on  the  tens 
and  hundreds  digits?  How  many  inversions  would  be  present  if  we  omitted  both  pass  1 
and  pass  2? 

12.  [24]  (M.  D.  MacLaren.)  Suppose  that  Algorithm  R has  been  applied  only  to  the 
p leading  digits  of  the  actual  keys;  thus  the  file  is  nearly  sorted  when  we  read  it  in 
the  order  of  the  links,  but  keys  that  agree  in  their  first  p digits  may  be  out  of  order. 
Design  an  algorithm  that  rearranges  the  records  in  place  so  that  their  keys  are  in  order, 
AT  < K2  < • • • < Kn-  [Hint:  The  special  case  that  the  file  is  perfectly  sorted  appears 
in  the  answer  to  exercise  5.2-12;  it  is  possible  to  combine  this  with  straight  insertion 
without  loss  of  efficiency,  since  few  inversions  remain  in  the  file.] 

13.  [40]  Implement  the  internal  sorting  method  suggested  in  the  text  at  the  close  of 
this  section,  producing  a subroutine  that  sorts  random  data  in  O(N)  units  of  time  with 
only  0(\/~N)  additional  memory  locations. 

14.  [22]  The  sequence  of  playing  cards 


can  be  sorted  into  increasing  order  A 2 ...  J IJ  K from  top  to  bottom  in  two  passes, 
using  just  two  piles  for  intermediate  storage:  Deal  the  cards  face  down  into  two  piles 
containing  respectively  A 2 9 3 10  and  4J56QK78  (from  bottom  to  top);  then  put 
the  second  pile  on  the  first,  turn  the  deck  face  up,  and  deal  into  two  piles  A2345678. 
9 10  J Q K.  Combine  these  piles,  turn  them  face  up,  and  you’re  done. 

Prove  that  this  sequence  of  cards  cannot  be  sorted  into  decreasing  order  K Q J . . . 2 A 
from  top  to  bottom  in  two  passes,  even  if  you  are  allowed  to  use  up  to  three  piles  for 
intermediate  storage.  (Dealing  must  always  be  from  the  top  of  the  deck,  turning  the 
cards  face  down  as  they  are  dealt.  Top  to  bottom  is  right  to  left  in  the  illustration.) 

15.  [M25]  Consider  the  problem  of  exercise  14  when  all  cards  must  be  dealt  face  up 
instead  of  face  down.  Thus,  one  pass  can  be  used  to  convert  increasing  order  into 
decreasing  order.  How  many  passes  are  required? 

► 16.  [25]  Design  an  algorithm  to  sort  strings  ai,  . . . , a„  on  an  m-letter  alphabet  into 
lexicographic  order.  The  total  running  time  of  your  algorithm  should  be  0(m  + n + N), 
where  N = |ai|  + • • • + |an|  is  the  total  length  of  all  the  strings. 
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17.  {15}  In  the  two-level  distribution  sort  proposed  by  Tamminen  (see  Theorem  T), 
why  is  a MacLaren-like  method  used  for  the  second  level  of  distribution  but  not  the 
first  level? 

18.  [HM26]  Prove  Theorem  T.  Hint:  Show  first  that  MacLaren’s  distribution-plus- 
insertion  algorithm  does  O(BN)  operations,  on  the  average,  when  it  is  applied  to 
independent  random  keys  whose  probability  density  function  satisfies  f(x)  < B for 
0 < x < 1. 


For  sorting  the  roots  and  words 
we  had  the  use  of  1100  lozenge  boxes, 
and  used  trays  for  the  forms. 
— GEORGE  V.  WIGRAM  (1843) 
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5.3.  OPTIMUM  SORTING 

Now  THAT  WE  have  analyzed  a great  many  methods  for  internal  sorting,  it  is 
time  to  turn  to  a broader  question:  What  is  the  best  possible  way  to  sort ? Can 
we  place  limits  on  the  maximum  sorting  speeds  that  will  ever  be  achievable,  no 
matter  how  clever  a programmer  might  be? 

Of  course  there  is  no  best  possible  way  to  sort;  we  must  define  precisely 
what  is  meant  by  “best,”  and  there  is  no  best  possible  way  to  define  “best.” 
We  have  discussed  similar  questions  about  the  theoretical  optimality  of  algo- 
rithms in  Sections  4.3.3,  4.6.3,  and  4.6.4,  where  high-precision  multiplication 
and  polynomial  evaluation  were  considered.  In  each  case  it  was  necessary  to 
formulate  a rather  simple  definition  of  a “best  possible”  algorithm,  in  order  to 
give  sufficient  structure  to  the  problem  to  make  it  workable.  And  in  each  case 
we  ran  into  interesting  problems  that  are  so  difficult  they  still  haven’t  been 
completely  resolved.  The  same  situation  holds  for  sorting;  some  very  interesting 
discoveries  have  been  made,  but  many  fascinating  questions  remain  unanswered. 

Studies  of  the  inherent  complexity  of  sorting  have  usually  been  directed 
towards  minimizing  the  number  of  times  we  make  comparisons  between  keys 
while  sorting  n items,  or  merging  m items  with  n,  or  selecting  the  t.  th  largest  of  an 
unordered  set  of  n items.  Sections  5.3.1,  5.3.2,  and  5.3.3  discuss  these  questions 
in  general,  and  Section  5.3.4  deals  with  similar  issues  under  the  interesting 
restriction  that  the  pattern  of  comparisons  must  essentially  be  fixed  in  advance. 
Several  other  types  of  interesting  theoretical  questions  related  to  optimum  sorting 
appear  in  the  exercises  for  Section  5.3.4,  and  in  the  discussion  of  external  sorting 
(Sections  5.4.4,  5.4.8,  and  5.4.9). 

/As  soon  as  an  Analytical  Engine  exists, 
it  will  necessarily  guide  the  future  course  of  the  science. 

Whenever  any  result  is  sought  by  its  aid, 
the  question  will  then  arise  — 
By  what  course  of  calculation  can  these 
results  be  arrived  at  by  the  machine 
in  the  shortest  time? 

— CHARLES  BABBAGE  (1864) 


5.3.1.  Minimum-Comparison  Sorting 

The  minimum  number  of  key  comparisons  needed  to  sort  n elements  is  obviously 
zero , because  we  have  seen  radix  methods  that  do  no  comparisons  at  all.  In  fact, 
it  is  possible  to  write  MIX  programs  that  are  able  to  sort,  although  they  contain 
no  conditional  jump  instructions  at  all!  (See  exercise  5-8  at  the  beginning  of  this 
chapter.)  We  have  also  seen  several  sorting  methods  that  are  based  essentially 
on  comparisons  of  keys,  yet  their  running  time  in  practice  is  dominated  by  other 
considerations  such  as  data  movement,  housekeeping  operations,  etc. 

Therefore  it  is  clear  that  comparison  counting  is  not  the  only  way  to  measure 
the  effectiveness  of  a sorting  method.  But  it  is  fun  to  scrutinize  the  number  of 
comparisons  anyway,  since  a theoretical  study  of  this  subject  gives  us  a good 
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Level  0 

Level  1 


Level  2 


Level  3 

Fig.  34.  A comparison  tree  for  sorting  three  elements. 

deal  of  useful  insight  into  the  nature  of  sorting  processes,  and  it  also  helps  us  to 
sharpen  our  wits  for  the  more  mundane  problems  that  confront  us  at  other  times. 

In  order  to  rule  out  radix-sorting  methods,  which  do  no  comparisons  at 
all,  we  shall  restrict  our  discussion  to  sorting  techniques  that  are  based  solely 
on  an  abstract  linear  ordering  relation  “<”  between  keys,  as  discussed  at  the 
beginning  of  this  chapter.  For  simplicity,  we  shall  also  confine  our  discussion  to 
the  case  of  distinct  keys,  so  that  there  are  only  two  possible  outcomes  of  any 
comparison  of  K,  versus  Ky.  either  Ki  < Kj  or  Ki  > Kj.  (For  an  extension 
of  the  theory  to  the  general  case  where  equal  keys  are  allowed,  see  exercises  3 
through  12.  For  bounds  on  the  worst-case  running  time  that  is  needed  to  sort 
integers  without  the  restriction  to  comparison-based  methods,  see  Fredman  and 
Willard,  J.  Computer  and  Syst.  Sci.  47  (1993),  424-436;  Ben-Amram  and  Galil, 
J.  Comp.  Syst.  Sci.  54  (1997),  345-370;  Thorup,  SODA  9 (1998),  550-555.) 

The  problem  of  sorting  by  comparisons  can  also  be  expressed  in  other 
equivalent  ways.  Given  a set  of  n distinct  weights  and  a balance  scale,  we  can 
ask  for  the  least  number  of  weighings  necessary  to  completely  rank  the  weights  in 
order  of  magnitude,  when  the  pans  of  the  balance  scale  can  each  accommodate 
only  one  weight.  Alternatively,  given  a set  of  n players  in  a tournament,  we 
can  ask  for  the  smallest  number  of  games  that  suffice  to  rank  all  contestants, 
assuming  that  the  strengths  of  the  players  can  be  linearly  ordered  (with  no  ties). 

All  n-element  sorting  methods  that  satisfy  the  constraints  above  can  be 
represented  in  terms  of  an  extended  binary  tree  structure  such  as  that  shown 
in  Fig.  34.  Each  internal  node  (drawn  as  a circle)  contains  two  indices  “ i:j ” 
denoting  a comparison  of  Ki  versus  Kj.  The  left  subtree  of  this  node  represents 
the  subsequent  comparisons  to  be  made  if  AT,  < Kj,  and  the  right  subtree 
represents  the  actions  to  be  taken  when  Ki  > Kj . Each  external  node  of  the  tree 
(drawn  as  a box)  contains  a permutation  Oi  o2...an  of  {1,2,...,  n},  denoting 
the  fact  that  the  ordering 

Aoi  ^ Ka2  <C  * * * < Kan 

has  been  established.  (If  we  look  at  the  path  from  the  root  to  this  external  node, 
each  of  the  n - 1 relationships  Kai  < Ka ,+1  for  1 < i < n will  be  the  result  of 
some  comparison  aq  :aj+1  or  ai+i:ai  on  this  path.) 
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Fig.  35.  Example  of  a redundant  comparison. 

Thus  Fig.  34  represents  a sorting  method  that  first  compares  Ad  with  A'2: 
if  Ki  > A2,  it  goes  on  (via  the  right  subtree)  to  compare  A'2  with  A3,  and 
then  if  K2  < K3  it  compares  A'i  with  A3;  finally  if  Ai  > K3  it  knows  that 
A2  < A3  < K\.  An  actual  sorting  algorithm  will  usually  also  move  the  keys 
around  in  the  file,  but  we  are  interested  here  only  in  the  comparisons,  so  we 
ignore  all  data  movement.  A comparison  of  Kl  with  Kj  in  this  tree  always 
means  the  original  keys  A;  and  Kj,  not  the  keys  that  might  currently  occupy 
the  ith  and  jth  positions  of  the  file  after  the  records  have  been  shuffled  around. 

It  is  possible  to  make  redundant  comparisons;  for  example,  in  Fig.  35  there 
is  no  reason  to  compare  3:1,  since  K\  < A2  and  A2  < K3  implies  that  Ki  < K3. 
No  permutation  can  possibly  correspond  to  the  left  subtree  of  node  3 : 1 in  Fig.  35; 
consequently  that  part  of  the  algorithm  will  never  be  performed!  Since  we  are 
interested  in  minimizing  the  number  of  comparisons,  we  may  assume  that  no  re- 
dundant comparisons  are  made.  Hence  we  have  an  extended  binary  tree  structure 
in  which  every  external  node  corresponds  to  a permutation.  All  permutations  of 
the  input  keys  are  possible,  and  every  permutation  defines  a unique  path  from 
the  root  to  an  external  node;  it  follows  that  there  are  exactly  n\  external  nodes 
in  a comparison  tree  that  sorts  n elements  with  no  redundant  comparisons. 

The  best  worst  case.  The  first  problem  that  arises  naturally  is  to  find 
comparison  trees  that  minimize  the  maximum  number  of  comparisons  made. 
(Later  we  shall  consider  the  average  number  of  comparisons.) 

Let  S(n)  be  the  minimum  number  of  comparisons  that  will  suffice  to  sort 
n elements.  If  all  the  internal  nodes  of  a comparison  tree  are  at  levels  < k,  it  is 
obvious  that  there  can  be  at  most  2fc  external  nodes  in  the  tree.  Hence,  letting 
k = S(n),  we  have 

n\  < 2s<n). 

Since  S(n)  is  an  integer,  we  can  rewrite  this  formula  to  obtain  the  lower  bound 

S{n)  > fig  n?l . (1) 

Stirling’s  approximation  tells  us  that 

flgn!]  = nlgn  - n/ ln2  + | lgn  + 0(1), 
hence  roughly  nlgn  comparisons  are  needed. 
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Relation  (1)  is  often  called  the  information-theoretic  lower  bound,  since 
cognoscenti  of  information  theory  would  say  that  lgn!  “bits  of  information”  are 
being  acquired  during  a sorting  process;  each  comparison  yields  at  most  one  bit  of 
information.  Trees  such  as  Fig.  34  have  also  been  called  “questionnaires”;  their 
mathematical  properties  were  first  explored  systematically  in  Claude  Picard’s 
book  Theorie  des  Questionnaires  (Paris:  Gauthier-Villars,  1965). 

Of  all  the  sorting  methods  we  have  seen,  the  three  that  require  fewest  com- 
parisons are  binary  insertion  (see  Section  5.2.1),  tree  selection  (see  Section  5.2.3), 
and  straight  two-way  merging  (see  Algorithm  5.2.4L).  The  maximum  number  of 
comparisons  for  binary  insertion  is  readily  seen  to  be 


B(n ) = = nTlgnl  ~ 2r‘gn]  + (3) 

fc= 1 


by  exercise  1.2.4-42,  and  the  maximum  number  of  comparisons  in  two-way 
merging  is  given  in  exercise  5.2.4-14.  We  will  see  in  Section  5.3.3  that  tree 
selection  has  the  same  bound  on  its  comparisons  as  either  binary  insertion  or 
two-way  merging,  depending  on  how  the  tree  is  set  up.  In  all  three  cases  we 
achieve  an  asymptotic  value  of  n lg  n;  combining  these  lower  and  upper  bounds 
for  S(n)  proves  that 


lim 


s(n) 

nlgn 


= 1. 


(4) 


Thus  we  have  an  approximate  formula  for  S(n),  but  it  is  desirable  to  obtain 
more  precise  information.  The  following  table  gives  exact  values  of  the  lower 
and  upper  bounds  discussed  above,  for  small  n: 


n=l  2 3 4 5 6 7 8 9 10  11  12  13  14  15  16  17 

[lgn!]  = 0 1 3 5 7 10  13  16  19  22  26  29  33  37  41  45  49 

B(n)  = 0 1 3 5 8 11  14  17  21  25  29  33  37  41  45  49  54 

L(n)  = 0 1 3 5 9 11  14  17  25  27  30  33  38  41  45  49  65 

Here  B(n)  and  L(n)  refer  respectively  to  binary  insertion  and  two-way  list 
merging.  It  can  be  shown  that  B(n)  < L(n)  for  all  n (see  exercise  2). 

From  the  table  above,  we  can  see  that  5(4)  = 5,  but  5(5)  might  be  either 
7 or  8.  This  brings  us  back  to  a problem  stated  at  the  beginning  of  Section  5.2: 
What  is  the  best  way  to  sort  five  elements?  Can  five  elements  be  sorted  using 
only  seven  comparisons? 

The  answer  is  yes,  but  a seven-step  procedure  is  not  especially  easy  to 
discover.  We  begin  by  first  comparing  Kx : A2,  then  K3.K4,  then  the  larger 
elements  of  these  pairs.  This  produces  a configuration  that  may  be  diagrammed 

b d 

n . 

ace 


to  indicate  that  a < b < d and  c < d.  (It  is  convenient  to  represent  known 
ordering  relations  between  elements  by  drawing  directed  graphs  such  as  this, 
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where  x is  known  to  be  less  than  y if  and  only  if  there  is  a path  from  x to  y in 
the  graph.)  At  this  point  we  insert  the  fifth  element  K5  — e into  its  proper  place 
among  {a,  b.  d};  only  two  comparisons  are  needed,  since  we  may  compare  it  first 
with  b and  then  with  a or  d.  This  leaves  one  of  four  possibilities, 

b d e b d bed  b d e 

_//  / / /7~~  <•> 

e a c a c a c a c 

and  in  each  case  we  can  insert  c among  the  remaining  elements  less  than  d in 
one  or  two  more  comparisons.  This  method  for  sorting  five  elements  was  first 
found  by  H.  B.  Demuth  [Ph.D.  thesis,  Stanford  University  (1956),  41-43]. 

Merge  insertion.  A pleasant  generalization  of  the  method  above  has  been 
discovered  by  Lester  Ford,  Jr.  and  Selmer  Johnson.  Since  it  involves  some  aspects 
of  merging  and  some  aspects  of  insertion,  we  shall  call  it  merge  insertion.  For 
example,  consider  the  problem  of  sorting  21  elements.  We  start  by  comparing 
the  ten  pairs  A'i : K2,  A3 : K.\, . . . , /\  i9 : A'2o ; then  we  sort  the  ten  larger  elements 
of  the  pairs,  using  merge  insertion.  As  a result  we  obtain  the  configuration 


“1  a.  2 a 3 a4  ag  ae  07  ag  ag  a 10 

rr  77  77  77  7'/ . 

bi  fe2  63  64  bg  be  67  bg  bg  fei0  fen 

analogous  to  (5).  The  next  step  is  to  insert  63  among  {61,  ai,  a2},  then  b2  among 
the  other  elements  less  than  a2;  we  arrive  at  the  configuration 


ci  c2  C3  C4  C5  eg  a4  a 5 ae  a 7 ag  ag  a 10 

* ~TTTTTT7.  ^ 

64  bg  be  67  bg  bg  feio  fen 

Let  us  call  the  upper-line  elements  the  main  chain.  We  can  insert  65  into  its 
proper  place  in  the  main  chain,  using  three  comparisons  (first  comparing  it  to 
C4,  then  c2  or  c6,  etc.);  then  64  can  be  moved  into  the  main  chain  in  three  more 
steps,  leading  to 


d\ 


d-2  d,4  c/5  de  dj  ds  dg 

>• >• >• >• >♦- 


d 10  « 6 0-7  os  ag  aio 

• ••••• 

be  67  bs  bg  610  6n 


(9) 


The  next  step  is  crucial;  is  it  clear  what  to  do?  We  insert  bn  ( not  b7)  into  the 
main  chain,  using  only  four  comparisons.  Then  610,  b9,  b&,  67,  b6  (in  this  order) 
can  also  be  inserted  into  their  proper  places  in  the  main  chain,  using  at  most 
four  comparisons  each. 

A careful  count  of  the  comparisons  involved  here  shows  that  the  21  elements 
have  been  sorted  in  at  most  10  + S(10)  + 2 + 2 + 3 + 3 + 4 + 4 + 4 + 4 + 4 + 4 — 66 
steps.  Since 


265  < 21!  < 266 
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we  also  know  that  no  fewer  than  66  would  be  possible  in  any  event;  hence 

5(21)  = 66.  (10) 

(Binary  insertion  would  have  required  74  comparisons.) 

In  general,  merge  insertion  proceeds  as  follows  for  n elements: 

i)  Make  pairwise  comparisons  of  [n/ 2j  disjoint  pairs  of  elements.  (If  n is  odd, 
leave  one  element  out.) 

ii)  Sort  the  [n/ 2J  larger  numbers,  found  in  step  (i),  by  merge  insertion. 

iii)  Name  the  elements  ax,a2,  ■ . . , a\n/2\  > &i,  b2,  • • • , 6|-n/2]  as  in  (7)1  where  ax  < 
02  < • • • < a\_n/ 2j  and  bz  < at  for  1 < i < [n/2j;  call  bx  and  the  o’s  the 
"main  chain.”  Insert  the  remaining  6’s  into  the  main  chain,  using  binary 
insertion,  in  the  following  order,  leaving  out  all  bj  for  j > \n/2]: 

h,l>2;  65,64;  6n,  610, . . . , 66;  ...;  btk , 6tfc_i, . . . , btk_1+i;  (11) 

We  wish  to  define  the  sequence  (fx,  t2,  t3,  t4, . . . ) = (1,3,5,11,...),  which 
appears  in  (11),  in  such  a way  that  each  of  btk , btk-i, . „t?  6tfe_1+i  can  be  inserted 
into  the  main  chain  with  at  most  k comparisons.  Generalizing  (7),  (8),  and  (9), 
we  obtain  the  diagram 

XI  x2  X2tk_ ! atk_1+ 1 atk_1+  2 atk- 1 

—r-7 — r~/ 

btk-i  + 1 btk- 1+2  htk- 1 btk 

where  the  main  chain  up  to  and  including  atk-i  contains  2 tk~\  + (tk  — tfc-i  — 1) 
elements.  This  number  must  be  less  than  2fe;  our  best  bet  is  to  set  it  equal  to 
2k  — 1,  so  that 

tk- 1 +tk  = 2fe.  (12) 

Since  tx  = 1,  we  may  set  t0  — 1 for  convenience,  and  we  find  that 

tk  = 2k-  tk- 1 =2k-  2k~l  + tk- 2 = • ■ • - 2k  - 2fc_1  + ■ • ■ + (— l)fc2° 

— (2fc+1  + ( — l)fc)/3  (13) 

by  summing  a geometric  series.  (Curiously,  this  same  sequence  arose  in  our 
study  of  an  algorithm  for  calculating  the  greatest  common  divisor  of  two  integers; 
see  exercise  4.5.2-36.) 

Let  F(n)  be  the  number  of  comparisons  required  to  sort  n elements  by  merge 
insertion.  Clearly 

F(n)=[n/2\+F([n/2\)  + G(\n/2-]),  (14) 

where  G represents  the  amount  of  work  involved  in  step  (iii).  If  tk-i  < rri  < tk, 
we  have 

k- 1 

G(m)  = y^Jjtj  - tj-i)  + k(m  - tk-i)  = km-(t0  + t 1 + • ••  + tk-i),  (15) 
j= 1 
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summing  by  parts.  Let  us  set 


wk  — to  + t\  + • • • + = [2fc+1/3j 


(16) 


so  that  ( wq , w\ , W2,  w 3,  W4 , . . . ) — (0,  1,  2,  5,  10,  21, . . . ).  Exercise  13  shows  that 
F(n)  - F(n  - 1)  = k if  and  only  if  wk  < n < wk+i,  (17) 
and  the  latter  condition  is  equivalent  to 


2^+1  2fc+2 
— - — < n < 


3 ’ 


or  k + 1 < lg  3n  < k + 2;  hence 


F(n)  -F(n-  1)  = [lgfn].  (18) 

(This  formula  is  due  to  A.  Hadian  [Ph.D.  thesis,  Univ.  of  Minnesota  (1969), 
38-42].)  It  follows  that  F(n)  has  a remarkably  simple  expression, 

F{n)  = ^2\\glk],  (19) 

k= 1 

quite  similar  to  the  corresponding  formula  (3)  for  binary  insertion.  A closed 
form  for  this  sum  appears  in  exercise  14. 

Equation  (19)  makes  it  easy  to  construct  a table  of  F(n);  we  have 

n=l  2 3 4 5 6 7 8 9 10  11  12  13  14  15  16  17 

("lgn!]=  0 1 3 5 7 10  13  16  19  22  26  29  33  37  41  45  49 

F{n)=  0 1 3 5 7 10  13  16  19  22  26  30  34  38  42  46  50 

n=  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33 

flgn!]—  53  57  62  66  70  75  80  84  89  94  98  103  108  113  118  123 

F(n)  = 54  58  62  66  71  76  81  86  91  96  101  106  111  116  121  126 

Notice  that  F(n)  ~ [lgn!]  for  1 < n < 11  and  for  20  < n < 21,  so  we  know  that 

merge  insertion  is  optimum  for  those  n: 


S(n)  = flgn!]  = F(n)  for  n = 1,  . . . , 11,  20,  and  21.  (20) 

Hugo  Steinhaus  posed  the  problem  of  finding  S(n)  in  the  second  edition  of  his 
classic  book  Mathematical  Snapshots  (Oxford  University  Press,  1950),  38-39.  He 
described  the  method  of  binary  insertion,  which  is  the  best  possible  way  to  sort  n 
objects  if  we  start  by  sorting  n-  1 of  them  first  before  the  nth  is  considered;  and 
he  conjectured  that  binary  insertion  would  be  optimum  in  general.  Several  years 
later  [Calcutta  Math.  Soc.  Golden  Jubilee  Commemoration  2 (1959),  323-327], 
he  reported  that  two  of  his  colleagues,  S.  Trybula  and  P.  Czen,  had  “recently” 
disproved  his  conjecture,  and  that  they  had  determined  S(n)  for  n < 11.  Trybula 
and  Czen  may  have  independently  discovered  the  method  of  merge  insertion, 
which  was  published  soon  afterwards  by  Ford  and  Johnson  [AMM  66  (1959) 
387-389], 

After  the  discovery  of  merge  insertion,  the  first  unknown  value  of  S(n)  was 
5(12).  Table  1 shows  that  12!  is  quite  close  to  229,  hence  the  existence  of  a 
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Table  1 

VALUES  OF  FACTORIALS  IN  BINARY  NOTATION 


(1) 
(10) 
(110) 
(11000) 
(1111000) 
(1011010000) 
(1001110110000) 
(1001110110000000) 
(1011000100110000000) 
(1101110101111100000000) 
(10011000010001010100000000) 
(11100100011001111110000000000) 
(101110011001010001100110000000000) 
(1010001001100001110110010100000000000) 
(10011000001110111011101110101100000000000) 
(100110000011 101110111011 101011000000000000000) 
(1010000110111111011101110110011011000000000000000) 
(10110101111101110110011001010011100110000000000000000) 
(110110000001010111001001100000110100010010000000000000000) 
(10000111000011011001110111110010000010101101000000000000000000) 


2 = 1! 

2 = 2! 

2 = 3! 
2=4! 

2 = 5! 

2 = 6! 

2 = 7! 

2 = 8! 

2 = 9! 

2 = 10! 
2 = 11! 
2 = 12! 
2 = 13! 
2 = 14! 
2 = 15! 
2 = 16! 
2 = 17! 
2 = 18! 
2 = 19! 
2 = 20! 


29-step  sorting  procedure  for  12  elements  is  somewhat  unlikely.  An  exhaustive 
search  (about  60  hours  on  a Maniac  II  computer)  was  therefore  carried  out  by 
Mark  Wells,  who  discovered  that  5(12)  = 30  [Proc.  IFIP  Congress  65  2 (1965), 
497-498;  Elements  of  Combinatorial  Computing  (Pergamon,  1971),  213-215]. 
Thus  the  merge  insertion  procedure  turns  out  to  be  optimum  for  n—  12  as  well. 

*A  slightly  deeper  analysis.  In  order  to  study  S(n)  more  carefully,  let  us  look 
more  closely  at  partial  ordering  diagrams  such  as  (5).  After  several  comparisons 
have  been  made,  we  can  represent  the  knowledge  we  have  gained  in  terms  of  a 
directed  graph.  This  directed  graph  contains  no  cycles,  in  view  of  the  transitivity 
of  the  < relation,  so  we  can  draw  it  in  such  a way  that  all  arcs  go  from  left  to 
right;  it  is  therefore  convenient  to  leave  arrows  off  the  diagram.  In  this  way  (5) 
becomes 


b d 


ace 


If  G is  such  a directed  graph,  let  T(G ) be  the  number  of  permutations  consistent 
with  G,  that  is,  the  number  of  ways  to  assign  the  integers  {1,  2, . . . , n}  to  the 
vertices  of  G so  that  the  number  on  vertex  x is  less  than  the  number  on  vertex 
V whenever  x -A  y in  G.  For  example,  one  of  the  permutations  consistent  with 
(21)  has  a = 1,  b = 4,  c = 2,  d = 5,  e = 3.  We  have  studied  T(G)  for  various  G 
in  Section  5.1.4,  where  we  observed  that  T(G)  is  the  number  of  ways  in  which 
G can  be  sorted  topologically. 
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If  G is  a graph  on  n elements  that  can  be  obtained  after  k comparisons,  we 
define  the  efficiency  of  G to  be 

E(G)=2 W(G)'  (22) 

(This  idea  is  due  to  Frank  Hwang  and  Shen  Lin.)  Strictly  speaking,  the  efficiency 
is  not  a function  of  the  graph  G alone,  it  depends  on  the  way  we  arrived  at  G 
during  a sorting  process,  but  it  is  convenient  to  be  a little  careless  in  our  language. 
After  making  one  more  comparison,  between  elements  i and  j,  we  obtain  two 
graphs  G,  and  G2,  one  for  the  case  Kt  < Kj  and  one  for  the  case  K,  > K, 
Clearly 

T(G)  = T(Gi)  + T(G2). 

If  T(G\)  > T(G2),  we  have 


E(G  r) 


T(G)  < 2T(Gr), 

= E(G)T(G) 

2k+1T{G1)  2T(Gi)  ~ K ’ 


(23) 


Therefore  each  comparison  leads  to  at  least  one  graph  of  less  or  equal  efficiency; 
we  can’t  improve  the  efficiency  by  making  further  comparisons. 

When  G has  no  arcs  at  all,  we  have  k = 0 and  T(G)  = n\,  so  the  initial 
efficiency  is  1.  At  the  other  extreme,  when  G is  a graph  representing  the  final 
result  of  sorting,  G looks  like  a straight  line  and  T(G)  = 1.  Thus,  for  example, 
if  we  want  to  find  a sorting  procedure  that  sorts  five  elements  in  at  most  seven 
steps,  we  must  obtain  the  linear  graph  » — • — • — • — *,  whose  efficiency  is  5! / (27  x 1)  = 
120/128  = 15/16.  It  follows  that  all  of  the  graphs  arising  in  the  sorting  procedure 
must  have  efficiency  > yf ; if  any  less  efficient  graph  were  to  appear,  at  least  one 
of  its  descendants  would  also  be  less  efficient,  and  we  would  ultimately  reach 
a linear  graph  whose  efficiency  is  < ||.  In  general,  this  argument  proves  that 
all  graphs  corresponding  to  the  tree  nodes  of  a sorting  procedure  for  n elements 
must  have  efficiency  > n\/ 2l,  where  l is  the  number  of  levels  of  the  tree  (not 
counting  external  nodes).  This  is  another  way  to  prove  that  S(n)  > [lgn!], 
although  the  argument  is  not  really  much  different  from  what  we  said  before. 

The  graph  (21)  has  efficiency  1,  since  T(G)  = 15  and  since  G has  been 
obtained  in  three  comparisons.  In  order  to  see  what  vertices  should  be  compared 
next,  we  can  form  the  comparison  matrix 


where  CtJ  is 
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for  the  graph  G 1 

obtained  by 

adding 

For  example,  if  we  compare  Kc  with  Ke 


(24) 


arc  i — ► j to  G. 
the  15  permutations  consistent  with  G 
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split  up  into  Cec  — 6 having  Ke  < Kc  and  Cce  = 9 having  Kc  < Ke.  The 
latter  graph  would  have  efficiency  15/(2  x 9)  = | < so  it  could  not  lead  to  a 
seven-step  sorting  procedure.  The  next  comparison  must  be  K b : Ke  in  order  to 
keep  the  efficiency  > 

The  concept  of  efficiency  is  especially  useful  when  we  consider  the  connected 
components  of  graphs.  Consider  for  example  the  graph 


a b d e 


f 9 


with  no  arcs  connecting  G'  to  G",  so  it  has  been  formed  by  making  some 
comparisons  entirely  within  G ' and  others  entirely  within  G".  In  general,  assume 
that  G = G'  © G"  has  no  arcs  between  G'  and  G ",  where  G'  and  G"  have 
respectively  n'  and  n"  vertices;  it  is  easy  to  see  that 

T(G)=  (n'  + n"'jT(G')T(G"),  (25) 


since  each  consistent  permutation  of  G is  obtained  by  choosing  n'  elements 
to  assign  to  G'  and  then  making  consistent  permutations  within  G'  and  G" 
independently.  If  k'  comparisons  have  been  made  within  G'  and  k"  within  G", 
we  have  the  basic  result 


E(G)  = 


{n!  +n")\ 
2fc'+fc"T(G) 


2k'T(G')  2 k"T(G") 


E{G’)E{G"), 


(26) 


showing  that  the  efficiency  of  a graph  is  related  in  a simple  way  to  the  efficiency 
of  its  components.  Therefore  we  may  restrict  consideration  to  graphs  having 
only  one  component. 

Now  suppose  that  G'  and  G"  are  one-component  graphs,  and  suppose  that 
we  want  to  hook  them  together  by  comparing  a vertex  x of  G'  with  a vertex  y 
of  G We  want  to  know  how  efficient  this  will  be.  For  this  purpose  we  need  a 
function  that  can  be  denoted  by 


defined  to  be  the  number  of  permutations  consistent  with  the  graph 


(27) 


ai  &2  Op  fljn 


b , 


bn 


bi  62 


(28) 
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Thus  ( ^ < qn ) is  (mr^n)  times  the  probability  that  the  pth  smallest  of  a set  of 
m numbers  is  less  than  the  gth  smallest  of  an  independently  chosen  set  of  n 
numbers.  Exercise  17  shows  that  we  can  express  (^<®)  in  two  ways  in  terms 
of  binomial  coefficients, 


P Q 

m n 


)=  E ( 


to  — p + n — k\  ( p — 1 + k' 


0<k<q 


E ( 

p<j<m 


lP~  1 + K\ 
TO  — p J V p — 1 / 

n-q  + m-j\fq-l+j\ 
n — q J V q— 1 / 


(29) 


(Incidentally,  it  is  by  no  means  obvious  on  algebraic  grounds  that  these  two  sums 
of  products  of  binomial  coefficients  should  come  out  to  be  equal.)  We  also  have 
the  formulas 


(p 

\m 


< 


p \ /m  + n\ 
mj  V m / 


(30) 


\n  mj  \ 


m+l  — p n+l  — q 

< 

m n 


(31) 


(P<9)  = ( P i<q)  + (P<  \)  +[P<m][q  = n}(m  + n X). 

V rn  nj  \m- 1 n J \ rn  n — lj  \ rn  / 

For  definiteness,  let  us  now  consider  the  two  graphs 


(32) 


(33) 


It  is  not  hard  to  show  by  direct  enumeration  that  T(G')  =42  and  T(G")  = 5;  so 
if  G is  the  11-vertex  graph  having  G ' and  G"  as  components,  we  have  T(G)  = 
(141)  • 42  ■ 5 = 69300  by  Eq.  (25).  This  is  a formidable  number  of  permutations 
to  list,  if  we  want  to  know  how  many  of  them  have  x;  < yj  for  each  i and  j. 
But  the  calculation  can  be  done  by  hand,  in  less  than  an  hour,  as  follows.  We 
form  the  matrices  A(G')  and  A(G"),  where  A^  is  the  number  of  consistent 
permutations  of  G'  (or  G")  in  which  x,  (or  yt)  is  equal  to  k.  Thus  the  number  of 
permutations  of  G in  which  Xi  is  less  than  yj  is  the  (i,p)  element  of  A(G')  times 
(y<1)  times  the  (j,q)  element  of  A(G”),  summed  over  1 < p < 7 and  1 < q < 4. 
In  other  words,  we  want  to  form  the  matrix  product  A(G')  ■ L ■ A(G")T , where 
Lpq  = (7 <4)-  This  comes  to 
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Fig.  36.  Some  graphs  and  their  efficiencies,  obtained  at  the  beginning  of  a long  proof 
that  5(12)  >29. 


Thus  the  “best”  way  to  hook  up  G'  and  G"  is  to  compare  x\  with  y2\  this  gives 
42042  cases  with  xi  < y2  and  69300  — 42042  = 27258  cases  with  x\  > y2.  (By 
symmetry,  we  could  also  compare  x3  with  y2,  x5  with  y3,  or  x7  with  y3,  leading  to 
essentially  the  same  results.)  The  efficiency  of  the  resulting  graph  for  xt  < y2  is 


69300 

84084 


E(G')E(G"), 


which  is  none  too  good;  hence  it  is  probably  a bad  idea  to  hook  G'  up  with  G" 
in  any  sorting  method.  The  point  of  this  example  is  that  we  are  able  to  make 
such  a decision  without  excessive  calculation. 

These  ideas  can  be  used  to  provide  independent  confirmation  of  Mark  Wells’s 
proof  that  5(12)  = 30.  Starting  with  a graph  containing  one  vertex,  we  can 
repeatedly  try  to  add  a comparison  to  one  of  our  graphs  G or  to  G'  ® G"  (a  pair 
of  graph  components  G'  and  G")  in  such  a way  that  the  two  resulting  graphs 
have  12  or  fewer  vertices  and  efficiency  > 12!/229  m 0.89221.  Whenever  this  is 
possible,  we  take  the  resulting  graph  of  least  efficiency  and  add  it  to  our  set, 
unless  one  of  the  two  graphs  is  isomorphic  to  a graph  we  already  have  included. 
If  both  of  the  resulting  graphs  have  the  same  efficiency,  we  arbitrarily  choose 
one  of  them.  A graph  can  be  identified  with  its  dual  (obtained  by  reversing  the 
order),  so  long  as  we  consider  adding  comparisons  to  G'  © dual(G")  as  well  as 
to  G'  © G" . A few  of  the  smallest  graphs  obtained  in  this  way  are  displayed  in 
Fig.  36  together  with  their  efficiencies. 

Exactly  1649  graphs  were  generated,  by  computer,  before  this  process  ter- 
minated. Since  the  graph  * — — — — • — • — • — • — • — • — • — • — • was  not  obtained,  we  may 
conclude  that  5(12)  > 29.  It  is  plausible  that  a similar  experiment  could  be 
performed  to  deduce  that  5(22)  > 70  in  a fairly  reasonable  amount  of  time,  since 
221/270  ss  0.952  requires  extremely  high  efficiency  to  sort  in  70  steps.  (Only  91 
of  the  1649  graphs  found  on  12  or  fewer  vertices  had  such  high  efficiency.) 


192  SORTING 


5.3.1 


Martin  Peczarski  [see  Algorithmica  40  (2004),  133-145;  Information  Proc. 
Letters  101  (2007),  126-128]  extended  Wells’s  method  and  proved  that  5(13)  = 
34,  5(14)  = 38,  5(15)  = 42,  5(22)  = 71;  thus  merge  insertion  is  optimum 
in  those  cases  as  well.  Intuitively,  it  seems  likely  that  5(16)  will  some  day  be 
shown  to  be  less  than  F(16),  since  F(16)  involves  no  fewer  steps  than  sorting 
ten  elements  with  5(10)  comparisons  and  then  inserting  six  others  by  binary 
insertion,  one  at  a time.  There  must  be  a way  to  improve  upon  this!  But  at 
present,  the  smallest  case  where  F(n)  is  definitely  known  to  be  nonoptimum  is 
n = 47:  After  sorting  5 and  42  elements  with  F(5)  + F(42)  = 178  comparisons, 
we  can  merge  the  results  with  22  further  comparisons,  using  a method  due  to 
J.  Schulte  Monting,  Theoretical  Comp.  Sci.  14  (1981),  19-37;  this  strategy  beats 
F{ 47)  = 201.  (Glenn  K.  Manacher  [JACM  26  (1979),  441-456]  had  previously 
proved  that  infinitely  many  n exist  with  S(n)  < F(n ),  starting  with  n = 189.) 


The  average  number  of  comparisons.  So  far  we  have  been  considering 
procedures  that  are  best  possible  in  the  sense  that  their  worst  case  isn’t  bad: 
in  other  words,  we  have  looked  for  “minimax”  procedures  that  minimize  the 
maximum  number  of  comparisons.  Now  let  us  look  for  a “minimean”  procedure 
that  minimizes  the  average  number  of  comparisons,  assuming  that  the  input  is 
random  so  that  each  permutation  is  equally  likely. 

Consider  once  again  the  tree  representation  of  a sorting  procedure,  as  shown 
in  Fig.  34.  The  average  number  of  comparisons  in  that  tree  is 


2+3+3+3T3+2 

6 


averaging  over  all  permutations.  In  general,  the  average  number  of  comparisons 
in  a sorting  method  is  the  external  path  length  of  the  tree  divided  by  n\.  (Recall 
that  the  external  path  length  is  the  sum  of  the  distances  from  the  root  to  each  of 
the  external  nodes;  see  Section  2. 3. 4. 5.)  It  is  easy  to  see  from  the  considerations 
of  Section  2. 3. 4. 5 that  the  minimum  external  path  length  occurs  in  a binary  tree 
with  N external  nodes  if  there  are  2 q - N external  nodes  at  level  q - 1 and 
2 N — 2q  at  level  q,  where  q = [IgA'].  (The  root  is  at  level  zero.)  The  minimum 
external  path  length  is  therefore 


{q  - 1)(29  - N)  + q(2N  - 2«)  = (q+  1 )N  - 2q.  (34) 

The  minimum  path  length  can  also  be  characterized  in  another  interesting  way: 
An  extended  binary  tree  has  minimum  external  path  length  for  a given  number 
of  external  nodes  if  and  only  if  there  is  a number  l such  that  all  external  nodes 
appear  on  levels  l and  l + 1.  (See  exercise  20.) 

If  we  set  q — lg  N + 9,  where  0 < 6 < 1,  the  formula  for  minimum  external 
path  length  becomes 

N(\gN + 1 + 9-2°).  (35) 

The  function  1 + 9 - 2°  is  shown  in  Fig.  37;  for  0 < 6 < 1 it  is  positive  but  very 
small,  never  exceeding 


1 - (1  + In  In  2)/ In  2 = 0.08607  13320  55934+. 


(36) 
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Fig.  37.  The  function  1 + 9 -2e. 


Thus  the  minimum  possible  average  number  of  comparisons,  obtained  by  dividing 
(35)  by  N,  is  never  less  than  lg  N and  never  more  than  lg  N + 0.0861.  [This  result 
was  first  obtained  by  A.  Gleason  in  an  internal  IBM  memorandum  (1956).] 

Now  if  we  set  N = n\,  we  get  a lower  bound  for  the  average  number  of 
comparisons  in  any  sorting  scheme.  Asymptotically  speaking,  this  lower  bound  is 

lg  n\  + 0(1)  = nlgn  — n/ln2  + O(logn).  (37) 

Let  F(n)  be  the  average  number  of  comparisons  performed  by  the  merge 
insertion  algorithm;  we  have 

n = 1 2 3 4 5 6 7 8 

lower  bound  (34)  = 0 2 16  112  832  6896  62368  619904 

nl  F(n)  = 0 2 16  112  832  6912  62784  623232 

Thus  merge  insertion  is  optimum  in  both  senses  for  n < 5,  but  for  n = 6 
it  averages  6912/720  = 9.6  comparisons  while  our  lower  bound  says  that  an 
average  of  6896/720  = 9.577777  . . . comparisons  might  be  possible.  A moment’s 
reflection  shows  why  this  is  true:  Some  “fortunate”  permutations  of  six  elements 
are  sorted  by  merge  insertion  after  only  eight  comparisons,  so  the  comparison 
tree  has  external  nodes  appearing  on  three  levels  instead  of  two.  This  forces 
the  overall  path  length  to  be  higher.  Exercise  24  shows  that  it  is  possible  to 
construct  a six-element  sorting  procedure  that  requires  nine  or  ten  comparisons 
in  each  case;  it  follows  that  this  method  is  superior  to  merge  insertion,  on  the 
average,  and  no  worse  than  merge  insertion  in  its  worst  case. 

When  n = 7,  Y.  Cesari  [Thesis  (Univ.  of  Paris,  1968),  page  37]  has  shown 
that  no  sorting  method  can  attain  the  lower  bound  62368  on  external  path 
length.  (It  is  possible  to  prove  this  fact  without  a computer,  using  the  results  of 
exercise  22.)  On  the  other  hand,  he  has  constructed  procedures  that  do  achieve 
the  lower  bound  (34)  when  n = 9 or  10.  In  general,  the  problem  of  minimizing 
the  average  number  of  comparisons  turns  out  to  be  substantially  more  difficult 
than  the  problem  of  determining  S(n).  It  may  even  be  true  that,  for  some  n,  all 
methods  that  minimize  the  average  number  of  comparisons  require  more  than 
S(n)  comparisons  in  their  worst  case. 

EXERCISES 

1.  [20]  Draw  the  comparison  trees  for  sorting  four  elements  using  the  method  of 
(a)  binary  insertion;  (b)  straight  two-way  merging.  What  are  the  external  path  lengths 
of  these  trees? 

2.  [M24]  Prove  that  B(n)  < L(n),  and  find  all  n for  which  equality  holds. 
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3.  [M22]  ( Weak  orderings.)  When  equality  between  keys  is  allowed,  there  are  13 
possible  outcomes  when  sorting  three  elements: 

AT  = K2  = K3,  Ki  = K2  < K3,  K\  = K3  < AT. 

K2  = K3<Ku  K1<K2  = K3,  K2  < AT  = K3, 

K3<Kx=  K2.  Ki  < K2  < K3,  Ki  < K3  < AT . 

K2  < Ki  < K3,  K2  < K3  < AT,  K3  < K i < K2,  K3  < K2  < K\. 

Let  P„  denote  the  number  of  possible  outcomes  when  n elements  are  sorted  with  ties 
allowed,  so  that  ( Po , P\,P2,P3,  Pa,P3,  . . . ) = (1,  1,  3,  13,  75,  541, . . . ).  Prove  that  the 
generating  function  P(z)  = Yln> o PnZn/n\  is  equal  to  1/(2  — ez).  Hint:  Show  that 


4.  [ HM27 } (O.  A.  Gross.)  Determine  the  asymptotic  value  of  the  numbers  Pn  of 
exercise  3,  as  n ->  oo.  [Possible  hint:  Consider  the  partial  fraction  expansion  of  cot  2.] 

5.  [16]  When  keys  can  be  equal,  each  comparison  may  have  three  results  instead 
of  two:  Ki  < Kj,  Ki  = Kj,  Ki  > K, . Sorting  algorithms  for  this  general  situation 
can  be  represented  as  extended  ternary  trees,  in  which  each  internal  node  i:j  has 
three  subtrees;  the  left,  middle,  and  right  subtrees  correspond  respectively  to  the  three 
possible  outcomes  of  the  comparison. 

Draw  an  extended  ternary  tree  that  defines  a sorting  algorithm  for  n = 3,  when 
equal  keys  are  allowed.  There  should  be  13  external  nodes,  corresponding  to  the  13 
possible  outcomes  listed  in  exercise  3. 

► 6.  [M22]  Let  S'  (n)  be  the  minimum  number  of  comparisons  necessary  to  sort  n 
elements  and  to  determine  all  equalities  between  keys,  when  each  comparison  has  three 
outcomes  as  in  exercise  5.  The  information-theoretic  argument  of  the  text  can  readily 
be  generalized  to  show  that  S' (n)  > [log3PTl],  where  Pn  is  the  function  studied  in 
exercises  3 and  4;  but  prove  that,  in  fact,  S'(n)  = S{n). 

7.  [20]  Draw  an  extended  ternary  tree  in  the  sense  of  exercise  5 for  sorting  four 
elements,  when  it  is  known  that  all  keys  are  either  0 or  1.  (Thus  if  AT  < K2  and 
AT  < A4,  we  know  that  A'i  = K3  and  AT  = A4!)  Use  the  minimum  average  number 
of  comparisons,  assuming  that  the  24  possible  inputs  are  equally  likely.  Be  sure  to 
determine  all  equalities  that  are  present;  for  example,  don’t  stop  sorting  when  you 
know  only  that  K\  < K2  < AT  < A4. 

8.  [26]  Draw  an  extended  ternary  tree  as  in  exercise  7 for  sorting  four  elements, 
when  it  is  known  that  all  keys  are  either  -1,  0,  or  +1.  Use  the  minimum  average 
number  of  comparisons,  assuming  that  the  34  possible  inputs  are  equally  likely. 

9.  [M20]  When  sorting  n elements  as  in  exercise  7,  knowing  that  all  keys  are  0 or  1. 
what  is  the  minimum  number  of  comparisons  in  the  worst  case? 

► 10.  [M25]  When  sorting  n elements  as  in  exercise  7,  knowing  that  all  keys  are  0 or  1. 
what  is  the  minimum  average  number  of  comparisons  as  a function  of  n? 

11.  [HM27]  When  sorting  n elements  as  in  exercise  5,  and  knowing  that  all  keys  are 
members  of  the  set  (1,2, . . . ,m},  let  Sm(n)  be  the  minimum  number  of  comparisons 
needed  in  the  worst  case.  [Thus  by  exercise  6,  S„(n)  = S(n).]  Prove  that,  for  fixed  m, 
Sm{n)  is  asymptotically  nlgm  + 0(1)  as  n — > oo. 
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when  n > 0. 
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► 12.  [M25]  (W.  G.  Bouricius,  circa  1954.)  Suppose  that  equal  keys  may  occur,  but  we 
merely  want  to  sort  the  elements  {K i,  K 2, . . . , Kn]  so  that  a permutation  ai  02  . . . a„ 
is  determined  with  Kai  < Ka2  < • • • < Kas ; we  do  not  need  to  know  whether  or  not 
equality  occurs  between  Kai  and  Kai+1 . 

Let  us  say  that  a comparison  tree  sorts  a sequence  of  keys  strongly  if  it  will  sort 
the  sequence  in  the  stated  sense  no  matter  which  branch  is  taken  below  the  nodes  i:j 
for  which  Ki  = Kj.  (The  tree  is  binary,  not  ternary.) 

a)  Prove  that  a comparison  tree  with  no  redundant  comparisons  sorts  every  sequence 
of  keys  strongly  if  and  only  if  it  sorts  every  sequence  of  distinct  keys. 

b)  Prove  that  a comparison  tree  sorts  every  sequence  of  keys  strongly  if  and  only  if 
it  sorts  every  sequence  of  zeros  and  ones  strongly. 

13.  [M28]  Prove  (17). 

14.  [M24]  Find  a closed  form  for  the  sum  (19). 

15.  [ M21 ] Determine  the  asymptotic  behavior  of  B(n ) and  F(n)  up  to  O(logn). 
[Hint:  Show  that  in  both  cases  the  coefficient  of  n involves  the  function  shown  in 
Fig.  37.] 

16.  [ HM26 ] (F.  Hwang  and  S.  Lin.)  Prove  that  F(n)  > [lgn!"|  for  n > 22. 

17.  [M20]  Prove  (29). 

18.  [20]  If  the  procedure  whose  first  steps  are  shown  in  Fig.  36  had  produced  the 
linear  graph  — • — — — — •— — » « « — • — ■ — • with  efficiency  12! /229,  would  this  have  proved 
that  5(12)  = 29? 

19.  [40]  Experiment  with  the  following  heuristic  rule  for  deciding  which  pair  of  el- 
ements to  compare  next  while  designing  a comparison  tree:  At  each  stage  of  sorting 
{AT, . . . , Kn],  let  Ui  be  the  number  of  keys  known  to  be  < Ki  as  a result  of  the  com- 
parisons made  so  far,  and  let  vl  be  the  number  of  keys  known  to  be  > Ki,  for  1 < i < n. 
Renumber  the  keys  in  terms  of  increasing  Ui/vi,  so  that  u\/v\  < U2/V2  < • • ■ < un/vn. 
Now  compare  Kt:Ki+ 1 for  some  i that  minimizes  \uiVi+i  — Ui+iVi\.  (Although  this 
method  is  based  on  far  less  information  than  a full  comparison  matrix  as  in  (24),  it 
appears  to  give  optimum  results  in  many  cases.) 

► 20.  [M26]  Prove  that  an  extended  binary  tree  has  minimum  external  path  length  if 
and  only  if  there  is  a number  l such  that  all  external  nodes  appear  on  levels  l and  l + 1 
(or  perhaps  all  on  a single  level  l). 

21.  [M21]  The  height  of  an  extended  binary  tree  is  the  maximum  level  number  of  its 
external  nodes.  If  x is  an  internal  node  of  an  extended  binary  tree,  let  t,(x)  be  the 
number  of  external  nodes  below  x,  and  let  l(x)  denote  the  root  of  x’s  left  subtree.  If 
x is  an  external  node,  let  t(x)  = 1.  Prove  that  an  extended  binary  tree  has  minimum 
height  among  all  binary  trees  with  the  same  number  of  nodes  if 

\t(x)  - 2t{l(x))\  < 2rigtW1  — t(x) 

for  all  internal  nodes  x. 

22.  [M24]  Continuing  exercise  21,  prove  that  a binary  tree  has  minimum  external 
path  length  among  all  binary  trees  with  the  same  number  of  nodes  if  and  only  if 

\t{x)  - 2t(l(x))\  < 2r‘gt(x)1  - t(x)  and  | t(x)  - 2t{l(x))\  < t(x)  - 2Llg‘WJ 

for  all  internal  nodes  x.  [Thus,  for  example,  if  t(x)  = 67,  we  must  have  t.{l(x))  = 32, 
33,  34,  or  35.  If  we  merely  wanted  to  minimize  the  height  of  the  tree  we  could  have 
3 < t(l(x))  < 64,  by  the  preceding  exercise.] 
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23.  [10]  The  text  proves  that  the  average  number  of  comparisons  made  by  any  sorting 
method  for  n elements  must  be  at  least  [lgn!]  « n lg  n.  But  multiple  list  insertion 
(Program  5.2. 1M)  takes  only  0(n)  units  of  time  on  the  average.  How  can  this  be? 

24.  [27]  (C.  Picard.)  Find  a sorting  tree  for  six  elements  such  that  all  external  nodes 
appear  on  levels  10  and  11. 

25.  [11]  If  there  were  a sorting  procedure  for  seven  elements  that  achieves  the  min- 
imum average  number  of  comparisons  predicted  by  the  use  of  Eq.  (34).  how  many 
external  nodes  would  there  be  on  level  13? 

26.  [M42]  Find  a sorting  procedure  for  seven  elements  that  minimizes  the  average 
number  of  comparisons  performed. 

► 27.  [20]  Suppose  it  is  known  that  the  configurations  A'i  < K2  < K3,  A'i  < K3  < A'2, 
<:  ^ 1 K2  < A3  < A'i,  K3  < A'i  < A2,  K3  < K2  < Ai  occur  with  respective 

probabilities  .01,  .25,  .01,  .24,  .25,  .24.  Find  a comparison  tree  that  sorts  these  three 
elements  with  the  smallest  average  number  of  comparisons. 

28.  [40]  Write  a MIX  program  that  sorts  five  one-word  keys  in  the  minimum  possible 
amount  of  time,  and  halts.  (See  the  beginning  of  Section  5.2  for  ground  rules.) 

29.  [M25]  (S.  M.  Chase.)  Let  ai  o2  . . . an  be  a permutation  of  {1, 2, ... , n}.  Prove  that 
any  algorithm  that  decides  whether  this  permutation  is  even  or  odd  (that  is,  whether 
it  has  an  even  or  odd  number  of  inversions),  based  solely  on  comparisons  between  the 
a’s,  must  make  at  least  nlgn  comparisons,  even  though  the  algorithm  has  only  two 
possible  outcomes. 

30.  [M23]  (Optimum  exchange  sorting.)  Every  exchange  sorting  algorithm  as  defined 
in  Section  5.2.2  can  be  represented  as  a comparison-exchange  tree,  namely  a binary  tree 
structure  whose  internal  nodes  have  the  form  i:j  for  i < j,  interpreted  as  the  following 
operation:  “If  Kt  < Kj,  continue  by  taking  the  left  branch  of  the  tree;  if  K,  > Kj , 
continue  by  interchanging  records  i and  j and  then  taking  the  right  branch  of  the  tree.” 
When  an  external  node  is  encountered,  it  must  be  true  that  K\  < K2  < ■ ■ ■ < Kn. 
Thus,  a comparison-exchange  tree  differs  from  a comparison  tree  in  that  it  specifies 
data  movement  as  well  as  comparison  operations. 

Let  Se(n)  denote  the  minimum  number  of  comparison-exchanges  needed,  in  the 
worst  case,  to  sort  n elements  by  means  of  a comparison-exchange  tree.  Prove  that 
Se(n)  < S(n)  + n — 1. 

31.  [M38]  Continuing  exercise  30,  prove  that  Se (5)  = 8. 

32.  [M42]  Continuing  exercise  31,  investigate  Se(n)  for  small  values  of  n > 5. 

33.  [M30]  (T.  N.  Hibbard.)  A real-valued  search  tree  of  order  x and  resolution  d is 
an  extended  binary  tree  in  which  all  nodes  contain  a nonnegative  real  value  such  that 
(i)  the  value  in  each  external  node  is  < 5 , (ii)  the  value  in  each  internal  node  is  at 
most  the  sum  of  the  values  in  its  two  children,  and  (iii)  the  value  in  the  root  is  x.  The 
weighted  path  length  of  such  a tree  is  defined  to  be  the  sum,  over  all  external  nodes,  of 
the  level  of  that  node  times  the  value  it  contains. 

Prove  that  a real-valued  search  tree  of  order  x and  resolution  1 has  minimum 
weighted  path  length,  taken  over  all  such  trees  of  the  same  order  and  resolution,  if  and 
only  if  equality  holds  in  (ii)  and  the  following  further  conditions  hold  for  all  pairs  of 
values  Xo  and  x\  that  are  contained  in  sibling  nodes:  (iv)  There  is  no  integer  k T 0 such 
that  *0  < 2 < ii  or  ii  < 2 < xq.  (v)  [xo]  — xq  + [xj]  — xi  <1.  (In  particular  if  x is 
an  integer,  condition  (v)  implies  that  all  values  in  the  tree  are  integers,  and  condition 
(iv)  is  equivalent  to  the  result  of  exercise  22.) 
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Also  prove  that  the  corresponding  minimum  weighted  path  length  is  x [ lg  x]  + 

|V|  -2riga:l 

34.  [M50]  Determine  the  exact  value  of  S(n)  for  infinitely  many  n. 

35.  [49]  Determine  the  exact  value  of  S(16). 

36.  [M50]  (S.  S.  Kislitsyn,  1968.)  Prove  or  disprove:  Any  directed  acyclic  graph  G 
with  T(G)  > 1 has  two  vertices  u and  v such  that  the  digraphs  G\  and  G 2 ob- 
tained from  G by  adding  the  arcs  u <—  v and  u — > v are  acyclic  and  satisfy  1 < 
T(Gi)/T(G2 ) < 2.  (Thus  T(Gi)/T(G)  always  lies  between  | and  for  some  u and  v.) 


*5.3.2.  Minimum-Comparison  Merging 

Let  us  now  consider  a related  question:  What  is  the  best  way  to  merge  an 
ordered  set  of  m elements  with  an  ordered  set  of  n?  Denoting  the  elements  to 
be  merged  by 


Ai  < A2  < ■■  ■ < Am  and  Bx  < B2  < ■ • • < Bn,  (1) 


we  shall  assume  as  in  Section  5.3.1  that  the  m + n elements  are  distinct.  The 
A's  may  appear  among  the  B's  in  (mrj^")  ways,  so  the  arguments  we  have  used 
for  the  sorting  problem  tell  us  immediately  that  at  least 


m + n 
m 


) 


(2) 


comparisons  are  required.  If  we  set  m = an  and  let  n — > 00,  while  a is  fixed, 
Stirling’s  approximation  tells  us  that 

lg(a?cm  ")  = n((1  + a)  1S(1  + a)  ~ alSa)  - + 0(!)-  (3) 

The  normal  merging  procedure,  Algorithm  5.2.4M,  takes  m + n — 1 comparisons 
in  its  worst  case. 

Let  M{m,n)  denote  the  function  analogous  to  S(n),  namely  the  minimum 
number  of  comparisons  that  will  always  suffice  to  merge  m things  with  n.  By 
the  observations  we  have  just  made, 

< M(m,  n)  < m + n — 1 for  all  m,  n > 1.  (4) 

Formula  (3)  shows  how  far  apart  this  lower  bound  and  upper  bound  can  be. 
When  a = 1 (that  is,  to  = n),  the  lower  bound  is  2 n — | lgn  + 0(1),  so  both 
bounds  have  the  right  order  of  magnitude  but  the  difference  between  them  can 
be  arbitrarily  large.  When  a = 0.5  (that  is,  m = |n),  the  lower  bound  is 

|n(lg3  - |)  + O(logn), 

which  is  about  lg  3 — | ss  0.918  times  the  upper  bound.  And  as  a decreases,  the 
bounds  get  farther  and  farther  apart,  since  the  standard  merging  algorithm  is 
primarily  designed  for  files  with  m « n. 
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When  m = n,  the  merging  problem  has  a fairly  simple  solution;  it  turns 
out  that  the  lower  bound  of  (4),  not  the  upper  bound,  is  at  fault.  The  follow- 
ing theorem  was  discovered  independently  by  R.  L.  Graham  and  R.  M.  Karp 
about  1968: 

Theorem  M.  For  all  m > 1,  we  have  M(m,m)  = 2m  — 1. 

Proof.  Consider  any  algorithm  that  merges  Ax  < ■ ■ • < Am  with  B1  < • • • < Bm. 
When  it  compares  Ai'.Bj,  take  the  branch  At  < Bj  if  * < j,  the  branch  A*  > Bj 
if  i > j.  Merging  must  eventually  terminate  with  the  configuration 

B\  < Ai  < B2  < A2  < ■ ■ ■ < Bm  < Am,  (5) 

since  this  is  consistent  with  all  the  branches  taken.  And  each  of  the  2 to  - 1 
comparisons 

Bi'.Ai,  Ai:B2,  B2\A2 , ...,  Bm:Am 

must  have  been  made  explicitly,  or  else  there  would  be  at  least  two  configurations 
consistent  with  the  known  facts.  For  example,  if  Ai  has  not  been  compared  to 
B2,  the  configuration 

Bx  < B2  < Ai  < A2  < ■ ■ ■ < Bm  < Am 

is  indistinguishable  from  (5).  | 

A simple  modification  of  this  proof  yields  the  companion  formula 

M(to,to+1)  = 2to,  for  to  > 0.  (6) 

Constructing  lower  bounds.  Theorem  M shows  that  the  “information  the- 
oretic” lower  bound  (2)  can  be  arbitrarily  far  from  the  true  value;  thus  the 
technique  used  to  prove  Theorem  M gives  us  another  way  to  discover  lower 
bounds.  Such  a proof  technique  is  often  viewed  as  the  creation  of  an  adversary , 
a pernicious  being  who  tries  to  make  algorithms  run  slowly.  When  an  algorithm 
for  merging  decides  to  compare  Ai : Bj , the  adversary  determines  the  fate  of  the 
comparison  so  as  to  force  the  algorithm  down  the  more  difficult  path.  If  we  can 
invent  a suitable  adversary,  as  in  the  proof  of  Theorem  M,  we  can  ensure  that 
every  valid  merging  algorithm  will  have  to  make  quite  a few  comparisons. 

We  shall  make  use  of  constrained  adversaries,  whose  power  is  limited  with 
regard  to  the  outcomes  of  certain  comparisons.  A merging  method  that  is  under 
the  influence  of  a constrained  adversary  does  not  know  about  the  constraints, 
so  it  must  make  the  necessary  comparisons  even  though  their  outcomes  have 
been  predetermined.  For  example,  in  our  proof  of  Theorem  M we  constrained  all 
outcomes  by  condition  (5),  yet  the  merging  algorithm  was  unable  to  make  use 
of  that  fact  in  order  to  avoid  any  of  the  comparisons. 

The  constraints  we  shall  use  in  the  following  discussion  apply  to  the  left  and 
right  ends  of  the  files.  Left  constraints  are  symbolized  by 
. (meaning  no  left  constraint), 

\ (meaning  that  all  outcomes  must  be  consistent  with  A\  < Bi), 

/ (meaning  that  all  outcomes  must  be  consistent  with  A1  > B 1); 
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similarly,  right  constraints  are  symbolized  by 
. (meaning  no  right  constraint), 

\ (meaning  that  all  outcomes  must  be  consistent  with  Am  < Bn), 

/ (meaning  that  all  outcomes  must  be  consistent  with  Am  > Bn). 

There  are  nine  kinds  of  adversaries,  denoted  by  A Mp,  where  A is  a left  constraint 
and  p is  a right  constraint.  For  example,  a \M\  adversary  must  say  that  Ai  < Bj 
and  Ai  < Bn\  a .M.  adversary  is  unconstrained.  For  small  values  of  m and  n, 
constrained  adversaries  of  certain  kinds  are  impossible;  when  to  = 1 we  obviously 
can’t  have  a \M / adversary. 

Let  us  now  construct  a rather  complicated,  but  very  formidable,  adversary 
for  merging.  It  does  not  always  produce  optimum  results,  but  it  gives  lower 
bounds  that  cover  a lot  of  interesting  cases.  Given  to,  n,  and  the  left  and  right 
constraints  A and  p,  suppose  the  adversary  is  asked  which  is  the  greater  of  Ai 
or  Bj.  Six  strategies  can  be  used  to  reduce  the  problem  to  cases  of  smaller  m + n : 

Strategy  A (k,l),  for  i < fc  < m and  1 < l < j.  Say  that  At  < Bj , and 
require  that  subsequent  operations  merge  {Ai, . . . , Ak)  with  {B i, . . . , i}  and 
{Ak+ii . . . , Am}  with  {£?;, . . . , Bn}.  Thus  future  comparisons  Ap : Bq  will  result 
in  Ap  < Bq  if  p < k and  q > l\  Ap  > Bq  if  p > k and  q < l]  they  will  be 
handled  by  a ( fc , l— 1,  A, .)  adversary  if  p < fc  and  q < l;  they  will  be  handled  by 
an  (m— fc,  n+l—l, . , p)  adversary  if  p > fc  and  q > l. 

Strategy  B (fc,/),  for  i < k < m and  1 < l < j-  Say  that  At  < Bj , and 
require  that  subsequent  operations  merge  {Ai, . . . , A}.}  with  { Bi , . . . , Bi}  and 
{Afc+ip. . , Am}  with  {Bi, . . . ,Bn},  stipulating  that  Ak  < Bi  < Ak+i-  (Note 
that  Bi  appears  in  both  lists  to  be  merged.  The  condition  Ak  < Bi  < Ak+ 1 
ensures  that  merging  one  group  gives  no  information  that  could  help  to  merge 
the  other.)  Thus  future  comparisons  Ap:Bq  will  result  in  Ap  < Bq  if  p < k and 
q > l ; Ap  > Bq  if  p k cind  q <i;  they  will  be  handled  by  a (fc,  l.  A,  \)  adversary 
if  p < fc  and  q < l;  by  an  (to— fc,  n+l—l,  /,  p)  adversary  if  p > fc  and  q>l. 

Strategy  C(k,l),  for  i < k < m and  1 < l < j.  Say  that  A*  < Bj,  and 
require  that  subsequent  operations  merge  {Ai, . . . , Ak}  with  {B\, . . . , and 

{Ak, . . . , Am}  with  {Bi, . . . , Bn},  stipulating  that  < Ak  < Bi.  (Analogous 
to  Strategy  B,  interchanging  the  roles  of  A and  B.) 

Strategy  A '(fc,/),  for  1 < fc  < i and  j < l < n.  Say  that  At  > Bj,  and 
require  the  merging  of  {Ai, . . . , Ak- 1}  with  {Bj, . . . , B{\  and  {Ak,  ■ ■ ■ , Am}  with 
{B[+x, . . . , Bn}.  (Analogous  to  Strategy  A.) 

Strategy  B ' (fc, Z) , for  1 < fc  < i and  j < l < n.  Say  that  A;  > Bj,  and 
require  the  merging  of  {A1; . . . , Ak- 1}  with  {B i, . . . , Bi)  and  {Ak,  ■ ■ ■ , Am}  with 
{B[ Bn),  subject  to  Ak~ i < Bi  < Ak-  (Analogous  to  Strategy  B.) 

Strategy  C' (fc, Z) , for  1 < fc  < i and  j < l < n.  Say  that  At  > Bj,  and 
require  the  merging  of  {Ai, . . . , Ak)  with  {B\, . . . ,Bi)  and  {Ak,  ■ ■ ■ , Am)  with 
[Bi+ 1, . . . , Bn },  subject  to  Bi  < Ak  < Bi+\.  (Analogous  to  Strategy  C.) 
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Because  of  the  constraints,  the  strategies  above  cannot  be  used  in  certain 
cases  summarized  here: 

Strategy  Must  be  omitted  when 

A(M),  B(*,1),  C(M)  A = / 

A'(M),  C'(1,0  A = \ 

A(m,l),  B(m,Z),  C(m,l)  P — / 

k !(k,n),B'(k,n),G(k,n)  p = \ 

Let  A Mp(m,n)  denote  the  maximum  lower  bound  for  merging  that  is  ob- 
tainable by  an  adversary  of  the  class  described  above.  Each  strategy,  when 
applicable,  gives  us  an  inequality  relating  these  nine  functions,  when  the  first 
comparison  is  Ai : Bj , namely, 

A (k,  l ):  A Mp(m,  n)  > 1 + A M.(k,  l—  1)  + .Mp(m—k,  n+1— Z); 

B(fc,  l ):  XMp(m,  n)  > 1 + A M\(k,  l ) + /Mp(m—k,  n+1— Z); 

C (k,l):  \Mp(m,n)  > 1 + XM/(k,l-l)  + \Mp(m+l-k,n+l-l); 

A !(k,  l ):  A Mp(m,  n)  > 1 + XM.(k—l,  Z)  + .Mp(m+l—k,  n-l)] 

B'(k,l):  \Mp(m,n)  > 1 + XM\(k—l,l)  + /Mp(m+l-k,n+l—l); 

C’(k,l):  XMp(m,  n)  > 1 + A M/(k,l)  + \Mp(m+l-k,n-l). 

For  fixed  i and  j,  the  adversary  will  adopt  a strategy  that  maximizes  the  lower 
bound  given  by  all  possible  right-hand  sides,  when  k and  l lie  in  the  ranges 
permitted  by  i and  j.  Then  we  define  XMp(m,n)  to  be  the  minimum  of  these 

lower  bounds  taken  over  1 < i < m and  1 < j < n.  When  rri  or  n is  zero, 

A Mp(m,n)  is  zero. 

For  example,  consider  the  case  m = 2 and  n = 3,  and  suppose  that  our 
adversary  is  unconstrained.  If  the  first  comparison  is  Ai  :B\,  the  adversary  may 
adopt  strategy  A'(l,  1),  requiring  ,M.( 0, 1)  + ,M.( 2,  2)  = 3 further  comparisons. 
If  the  first  comparison  is  Ai'.B^,  the  adversary  may  adopt  strategy  B(l,2), 
requiring  .M\(l,2)  + /M.  (1,2)  = 4 further  comparisons.  No  matter  what 
comparison  Ai  :Bj  is  made  first,  the  adversary  can  guarantee  that  at  least  three 
further  comparisons  must  be  made.  Hence  .M.( 2,3)  = 4. 

It  isn’t  easy  to  do  these  calculations  by  hand,  but  a computer  can  grind  out 
tables  of  XMp  functions  rather  quickly.  There  are  obvious  symmetries,  such  as 

/M.(m,n)  — . M\(m,n ) = \M.(n,m)  = .M/(n,m),  (7) 

by  means  of  which  we  can  reduce  the  nine  functions  to  just  four, 

.M.(m,n),  /M.(m,n),  /M\(m,n),  and  /M/(m,n). 

Table  1 shows  the  resulting  values  for  all  m,  n < 10;  our  merging  adversary  has 
been  defined  in  such  a way  that 


n)  < M(m,n)  for  all 


m,  n > 0. 


(8) 
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Table  1 

LOWER  BOUNDS  FOR  MERGING,  FROM  THE  “ADVERSARY” 
,M.(m,n ) /M.(m,n) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10  n 

1 

2 

3 

4 

5 

6 

7 
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3 

3 

4 

4 

4 

1 

2 

2 

3 

3 

3 

3 

4 

4 

4 

1 

2 

2 

3 

4 

5 

5 

6 

6 

6 

7 

7 

1 

3 

4 

4 

5 

5 

6 

6 

7 

7 

2 

3 

2 

4 

5 

6 

7 

7 

8 

8 

9 

9 

1 

3 

5 

6 

7 

7 

8 

8 

9 

9 

3 

4 

3 

5 

6 

7 

8 

9 

10 

10 

11 

11 

1 

4 

5 

7 

8 

9 

9 

10 

10 

11 

4 

5 

3 

5 

7 

8 

9 

10 

11 

12 

12 

13 

1 

4 

6 

8 

9 

10 

11 

12 

12 

13 

5 

6 

3 

6 

7 
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This  relation  includes  Theorem  M as  a special  case,  because  our  adversary  will 
use  the  simple  strategy  of  that  theorem  when  \m  — n\  < 1. 

Let  us  now  consider  some  simple  relations  satisfied  by  the  M function: 


M(m,n)  = M(n,m); 

(9) 

M(m,n)  < M(m,  n+1); 

(10) 

M(k+m,  n)  < M(k , n)  + M(m,  n): 

(11) 

M(m,n)  < max(M(?7i, n— 1)  + 1,  M(m—l,n)  + l), 

for  m > 1,  n > 1; 

(12) 

M(m,n)  < max(M(ra, n— 2)  + 1,  M(m—l,n)  + 2), 

for  m > 1,  n > 2. 

03) 

Relation  (12)  comes  from  the  usual  merging  procedure,  if  we  first  compare 
Ai  '.Bi.  Relation  (13)  is  derived  similarly,  by  first  comparing  Ai : B2\  if  -4,  > U2, 
we  need  M(m,n—2)  more  comparisons,  but  if  A\  < B2,  we  can  insert  ^4i  into 
its  proper  place  and  merge  {A2, . . . , Am}  with  {B 1, . . . , Bn}.  Generalizing,  we 
can  see  that  if  m > 1 and  n > k we  have 

M(m,n)  < max(M(m,n— k)  + 1,  M(m—l,n)  + 1 + [IgA:]),  (14) 

by  first  comparing  Ai : B).  and  using  binary  search  if  Ai  < Bk- 

It  turns  out  that  M(m,  n)  = n)  for  all  m,  n < 10,  so  Table  1 actually 

gives  the  optimum  values  for  merging.  This  can  be  proved  by  using  (g)-(i4) 
together  with  special  constructions  for  (m,  n)  = (2,8),  (3,6),  and  (5,9)  given  in 
exercises  8,  9,  and  10. 


202  SORTING 


5.3.2 


On  the  other  hand,  our  adversary  doesn’t  always  give  the  best  possible 
lower  bounds;  the  simplest  example  is  m = 3,  n = 11,  when  ,M.{ 3, 11)  = 9 
but  M(3, 11)  = 10.  To  see  where  the  adversary  has  “failed”  in  this  case,  we 
must  study  the  reasons  for  its  decisions.  Further  scrutiny  reveals  that  if  (i,j)  / 
(2, 6),  the  adversary  can  find  a strategy  that  demands  10  comparisons;  but  when 
(*,  j)  = (2,6),  no  strategy  beats  Strategy  A(2,4),  leading  to  the  lower  bound 
1 + .M.( 2,3)  + ,M.(1,8)  - 9.  It  is  necessary  but  not  sufficient  to  finish  by 
merging  {A1,A2}  with  {B1,B2,B3}  and  {A3}  with  {B4, . . . , Bn},  so  the  lower 
bound  fails  to  be  sharp  in  this  case. 

Similarly  it  can  be  shown  that  ,M.( 2,38)  = 10  while  M( 2,38)  = 11,  so  our 
adversary  isn’t  even  good  enough  to  solve  the  case  m = 2.  But  there  is  an  infinite 
class  of  values  for  which  it  excels: 

Theorem  K.  M(m,  m+ 2)  = 2m  + 1,  for  m > 2; 

M(m,  m+ 3)  = 2 m + 2,  for  m > 4; 

M(m,  m+4)  = 2m.  + 3,  for  m > 6. 

Proof.  We  can  in  fact  prove  the  result  with  M replaced  by  .M. ; for  small  m the 

results  have  been  obtained  by  computer,  so  we  may  assume  that  m is  sufficiently 
large.  We  may  also  assume  that  the  first  comparison  is  At  :Bj  where  i < [m/2]. 
If  j < i we  use  strategy  A'(i,i),  obtaining 

■M.  (m,  m+d)  > 1 + .M.(i-l,i)  + .M.(m+l-i,m+d-i)  = 2m  + d - 1 
by  induction  on  d,  for  d < 4.  If  j > i we  use  strategy  A(i,i+1),  obtaining 
.M.  (m,  m+d ) > 1 + .M.(i,i)  + ■ M . ( m-i , m+d-i)  = 2m  + d - 1 
by  induction  on  m.  | 

The  first  two  parts  of  Theorem  K were  obtained  by  F.  Hwang  and  S.  Lin 
in  1969.  Paul  Stockmeyer  and  Frances  Yao  showed  several  years  later  that  the 
pattern  evident  in  these  three  formulas  holds  in  general,  namely  that  the  lower 
bounds  derived  by  the  adversarial  strategies  above  suffice  to  establish  the  values 
M(m,  m+d)  = 2m  + d - 1 for  m > 2d  - 2.  [SICOMP  9 (1980),  85-90.] 

Upper  bounds.  Now  let  us  consider  upper  bounds  for  M(m,  n);  good  upper 
bounds  correspond  to  efficient  merging  algorithms. 

When  m = 1 the  merging  problem  is  equivalent  to  an  insertion  problem, 
and  there  are  n + 1 places  in  which  Ax  might  fall  among  For  this 

case  it  is  easy  to  see  that  any  extended  binary  tree  with  n+  1 external  nodes  is 
the  tree  for  some  merging  method!  (See  exercise  2.)  Hence  we  may  choose  an 
optimum  binary  tree,  realizing  the  information-theoretic  lower  bound 

1 + [ignj  = M(l,n)  = [lg(n  + 1)].  (i5) 

Binary  search  (Section  6.2.1)  is,  of  course,  a simple  way  to  attain  this  value. 

The  case  m = 2 is  extremely  interesting,  but  considerably  harder.  It  has 
been  solved  completely  by  R.  L.  Graham,  F.  K.  Hwang,  and  S.  Lin  (see  exercises 
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11,  12,  and  13),  who  proved  the  general  formula 

M(2,n)  = flg^(n  + 1)]  + [lg±f(n  + 1)].  (16) 

We  have  seen  that  the  usual  merging  procedure  is  optimum  when  m = n, 
while  the  rather  different  binary  search  procedure  is  optimum  when  m — 1.  What 
we  need  is  an  in-between  method  that  combines  the  normal  merging  algorithm 
with  binary  search  in  such  a way  that  the  best  features  of  both  are  retained. 
Formula  (14)  suggests  the  following  algorithm,  due  to  F.  K.  Hwang  and  S.  Lin 
[SICOMP  1 (1972),  31-39]: 

Algorithm  H ( Binary  merging). 

HI.  [If  not  done,  choose  t.]  If  m or  n is  zero,  stop.  Otherwise,  if  m > n,  set 
t «—  [lg(m/n)J  and  go  to  step  H4.  Otherwise  set  t [lg(n/m)J. 

H2.  [Compare.]  Compare  Am:Bn+1_2 1.  If  Am  is  smaller,  set  n <-  n - 2*  and 
return  to  step  HI. 

H3.  [Insert.]  Using  binary  search  (which  requires  exactly  t more  comparisons), 
insert  Am  into  its  proper  place  among  {Bn+1_2t, . . . , Bn}.  If  k is  maximal 
such  that  Hfc  < Am,  set  m <—  m — 1 and  n 4—  k.  Return  to  HI. 

H4.  [Compare.]  (Steps  H4  and  H5  are  like  H2  and  H3,  interchanging  the  roles 
of  in  and  n,  A and  B.)  If  Bn  < Am+1_2t,  set  m t—  m — 2*  and  return  to 
step  HI. 

H5.  [Insert.]  Insert  Bn  into  its  proper  place  among  the  A’ s.  If  k is  maximal 
such  that  Afc  < Bn , set  to  t—  k and  n <—  n — 1.  Return  to  HI.  | 

As  an  example  of  this  algorithm,  Table  2 shows  the  process  of  merging 
the  three  keys  {087,  503,  512}  with  thirteen  keys  {061,  154, . . . , 908};  eight 
comparisons  are  required  in  this  example.  The  elements  compared  at  each  step 
are  shown  in  boldface  type. 


Table  2 

EXAMPLE  OF  BINARY  MERGING 


A 

B 

Output 

087 

503 

512 

061 

154 

170 

275 

426 

509 

612 

653 

677 

703 

765 

897 

908 

087 

503 

512 

061 

154 

170 

275 

426 

509 

612 

653 

677 

_T 

703 

765 

897 

908 

087 

503 

512 

061 

154 

170 

275 

426 

509 

612 

653 

677  703 

765 

897 

908 

087 

503 

512 

061 

154 

170 

275 

426 

509 

612 

653 

677 

703 

765 

897 

908 

087 

503 

061 

154 

170 

275 

426 

509 

512 

612 

653 

677 

703 

765 

897 

908 

087 

503 

061 

154 

170 

275 

426 

509 

512 

612 

653 

677 

703 

765 

897 

908 

087 

061 

154 

170 

275 

426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908 

087 

061 

Jr 

154 

170 

275 

426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908 

061 

p87 

154 

170 

275 

426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908 

Let  H(m,n)  be  the  maximum  number  of  comparisons  required  by  Hwang 
and  Lin’s  algorithm.  To  calculate  H(m,n),  we  may  assume  that  k — n in  step 
H3  and  k = m in  step  H5,  since  we  shall  prove  that  H(m  — 1,  n)  < H(m  — 1,  n + 1) 
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for  all  n > rn  — 1 by  induction  on  to.  Thus  when  m < n we  have 

H(m,n)  = max(if(TO,  n-2*)+l,  H(m—  1,  n)+f+l),  (17) 

for  2*to  <n<  2t+1m.  Replace  n by  2n  + e,  with  e = 0 or  1,  to  get 

H(m,  2 n+e)  = max  (H(m,  2n+e-2t+1)  + 1,  H(m- 1,  2n+e)+f+2) , 
for  2*m  < n < 2t+1m ; and  it  follows  by  induction  on  n that 


H(m,2n+e)  — H(m,n)  + m,  for  to  < n and  e = 0 or  1.  (18) 

It  is  also  easy  to  see  that  H(m,n)  = m + n - 1 when  m < n < 2 to;  hence  a 
repeated  application  of  (18)  yields  the  general  formula 

H(m,n)  = to  + \n/2t\  - 1 + tm,  for  m < n,  t=[\g(n/m)\.  (19) 

This  implies  that  H(m,n)  < H(m,  n+1)  for  all  n > m,  verifying  our  inductive 
hypothesis  about  step  H3. 

Setting  to  = an  and  9 — lg (n/m)  — t gives 

H(an,n)  = an{  1 + 2 9 - 9 - lga)  + 0(1),  (20) 

as  n -»  cxo.  We  know  by  Eq.  5.3.1-(36)  that  1.9139  < 1 + 2e  - 9 < 2;  hence  (20) 
may  be  compared  with  the  information-theoretic  lower  bound  (3).  Hwang  and 
Lin  have  proved  (see  exercise  17)  that 


H(m. , n)  < 


to  + n 

TO. 


+ min  (to,  n). 


(21) 


The  Hwang-Lin  binary  merging  algorithm  does  not  always  give  optimum 
results,  but  it  has  the  great  virtue  that  it  can  be  programmed  rather  easily. 
It  reduces  to  “uncentered  binary  search”  when  m = 1,  and  it  reduces  to  the 
usual  merging  procedure  when  m « n,  so  it  represents  an  excellent  compromise 
between  those  two  methods.  Furthermore,  it  is  optimum  in  many  cases  (see 
exercise  16).  Improved  algorithms  have  been  found  by  F.  K.  Hwang  and  D.  N. 
Deutsch,  JACM  20  (1973),  148-159;  G.  K.  Manacher,  JACM  26  (1979),  434- 
440;  and  most  notably  by  C.  Christen,  FOCS  19  (1978),  259-266.  Christen's 
merging  procedure,  called  forward-testing-backward-insertion , saves  about  m/3 
comparisons  over  Algorithm  H when  n/m  — >•  00.  Moreover,  Christen’s  procedure 
achieves  the  lower  bound  .M.  (m,n)  = [ (11m  + n — 3)/4j  when  5 to  -3  < n < 
7 to  + 2 [to  even];  hence  it  is  optimum  in  such  cases  (and,  remarkably,  so  is  oili- 
adversarial  lower  bound). 

Formula  (18)  suggests  that  the  M function  itself  might  satisfy 


M(m,n)  < M(m,[n/2\)  + m.  (22) 

This  is  actually  true  (see  exercise  19).  Tables  of  M(m,n)  suggest  several  other 
plausible  relations,  such  as 


M(to+1,  n)  > 1 + M(m,n)  > M(to,  n+1),  for  to  < n;  (23) 

M(m+l,  n + l)  > 2 + M(m,n);  (24) 

but  no  proof  of  these  inequalities  is  known. 
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EXERCISES 

1.  [15]  Find  an  interesting  relation  between  M(m,n)  and  the  function  S defined  in 
Section  5.3.1.  [Hint:  Consider  S(m  + n).] 

► 2.  [22]  When  m = 1,  every  merging  algorithm  without  redundant  comparisons 
defines  an  extended  binary  tree  with  (mJ]n)  = n + 1 external  nodes.  Prove  that, 
conversely,  every  extended  binary  tree  with  n + 1 external  nodes  corresponds  to  some 
merging  algorithm  with  m = 1. 

3.  [M24]  Prove  that  ,M.(l,n)  = M(l,n)  for  all  n. 

4.  [M42]  Is  .M.  (m,n)  > [lg  for  all  m and  n? 

5.  [M30]  Prove  that  .M.  (m,  n)  < .M\(m,  n+1). 

6.  [M26]  The  stated  proof  of  Theorem  K requires  that  a lot  of  cases  be  verified  by 
computer.  How  can  the  number  of  such  cases  be  drastically  reduced? 

7.  [21]  Prove  (n). 

► 8.  [24]  Prove  that  M( 2,8)  < 6,  by  finding  an  algorithm  that  merges  two  elements 
with  eight  others  using  at  most  six  comparisons. 

9.  [21]  Prove  that  three  elements  can  be  merged  with  six  in  at  most  seven  steps. 

10.  [33]  Prove  that  five  elements  can  be  merged  with  nine  in  at  most  twelve  steps. 
[Hint:  Experience  with  the  adversary  suggests  first  comparing  A\:B2,  then  trying 
A5-.Bs  if  Hr  < Bo.] 

11.  [M40]  (F.  Hwang,  S.  Lin.)  Let  g2k  = L2*  g2k+ 1 = |_2fc  “J,  for  k > °>  so  that 
(go,gi,g2,  ■ ■ ■ ) = (1,1,2,3,4,6,9,13, 19,27, 38,54,77, ...).  Prove  that  it  takes  more 
than  t comparisons  to  merge  two  elements  with  gt  elements,  in  the  worst  case;  but  two 
elements  can  be  merged  with  gt  — 1 in  at  most  t steps.  [Hint:  Show  that  if  n = gt  or 
n = gt  — 1 and  if  we  want  to  merge  {Ai,A2}  with  {Bi,B2,  . . . , Bn}  in  t comparisons, 
we  can’t  do  better  than  to  compare  A 2 : Bgt  _l  on  the  first  step.] 

12.  [M21]  Let  R„(i,j ) be  the  least  number  of  comparisons  required  to  sort  the  distinct 
objects  {a,/3,  Xi, . . . , Xn},  given  the  relations 

ot  < (3,  Xi  < X2  < • ■ • < Xn,  a < Xi+i,  (3  > Xn—j. 

(The  condition  a < Xt+i  or  /3  > Xn-j  becomes  vacuous  when  i > n or  j > n. 
Therefore  Rn(n,n ) = M(2,n).) 

Clearly,  /?n(0,0)  = 0.  Prove  that 

R„(i,j)  = 1 +min(  min  max(f?n  (fc— 1,  j),  Rn-k{i—k,  j)), 
l<k<i 

min  max(Rn(i,  fc— 1),  R„-k{i,  j~k))) 

1 <k<j 

for  0 < i < n,  0 < j < n,  i + j > 0. 

13.  [M42]  (R.  L.  Graham.)  Show  that  the  solution  to  the  recurrence  in  exercise  12 
may  be  expressed  as  follows.  Define  the  function  G(x),  for  0 < x < 00,  by  the  rules 

!1,  if  0 < x < f; 

2 + jjG(8x  — 5),  if  j < x < 5; 

|G(2a:  — 1),  if  f < x < 1; 

0,  if  1 < x < 00. 
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(See  Fig.  38.)  Since  Rn(i,j)  = Rn(j,i)  and  since  Rn(0,j)  = M(l,j),  we  may  assume 

that  1 < i < j < n.  Let  p = |_lg  i\ , q = [lg  jj  ,r=  [lg  nj , and  let  t = n - 2r  + 1.  Then 

Rn(i,j)  =p  + q + Sn(i,j)  + Tn(i,j), 

where  Sn  and  Tn  are  functions  that  are  either  0 or  1: 

Sn(i,j)  = 1 if  and  only  if  q < r or  (i  - 2P  > u and  j - 2r  > u), 

Tn(i,j)  = 1 if  and  only  if  p < r or  (t  > f 2r“2  and  i-  2r  > v), 

where  u = 2pG(t/2p)  and  v = 2r_2G(t/2r^2). 

(This  may  be  the  most  formidable  recurrence  relation  that  will  ever  be  solved!) 

1.0 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0.0 

0.0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.0  1.1  1.2  1.3 
Fig.  38.  Graham’s  function  (see  exercise  13). 

[41]  (F.  K.  Hwang.)  Let  h3k  = |_§f  2fcJ  - 1,  h3k+ 1 = h3k  + 3 • 2k~3 , h3k+2  = 
LV  2 — f J for  k > 3,  and  let  the  initial  values  be  defined  so  that 

(h0,  hi,  h2, . . . ) = (1, 1, 2,  2, 3, 4, 5, 7, 9, 11, 14, 18, 23, 29, 38, 48, 60,  76, . . . ) . 

Prove  that  M(3,ht)  > t and  M( 3,  ht  — 1)  < t for  all  t,  thereby  establishing  the  exact 
values  of  M( 3,  n)  for  all  n. 

15.  [12]  Step  HI  of  the  binary  merge  algorithm  may  require  the  calculation  of  the 
expression  [lg(n/m)J , for  n > m.  Explain  how  to  compute  this  easily  without  division 
or  calculation  of  a logarithm. 

16.  [18]  For  which  m and  n is  Hwang  and  Lin’s  binary  merging  algorithm  optimum, 
for  1 < m < n < 10? 

17.  [ M25 ] Prove  (21).  [Hint:  The  inequality  isn’t  very  tight.] 

18.  [M40]  Study  the  average  number  of  comparisons  used  by  binary  merge. 

► 19.  [23]  Prove  that  the  M function  satisfies  (22). 

20.  [20]  Show  that  if  M(m,  n+l)  < M(m+l,n)  for  all  m < n,  then  M(m,n+ 1)  < 
1 + M(m,  n)  for  all  m < n. 

21.  [M47]  Prove  or  disprove  (23)  and  (24). 
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22.  [M43]  Study  the  minimum  average  number  of  comparisons  needed  to  merge  m 
things  with  n. 

23.  [M31]  (E.  Reingold.)  Let  {Ai,...,An}  and  {Bi,...,Bn}  be  sets  containing 
n elements  each.  Consider  an  algorithm  that  attempts  to  test  equality  of  these  two 
sets  solely  by  making  comparisons  for  equality  between  elements.  Thus,  the  algorithm 
asks  questions  of  the  form  “Is  Ai  = B3  ? ” for  certain  i and  j,  and  it  branches  depending 
on  the  answer. 

By  defining  a suitable  adversary,  prove  that  any  such  algorithm  must  make  at  least 
|n(n  + 1)  comparisons  in  its  worst  case. 

24.  [22]  (E.  L.  Lawler.)  What  is  the  maximum  number  of  comparisons  needed  by  the 
following  algorithm  for  merging  m elements  with  n > m elements?  “Set  t <—  [lg(n/m)J 
and  use  Algorithm  5.2.4M  to  merge  Ai,  A2,  . . . , Am  with  B2t,  B2.2t , . . . , Bq.2 t,  where 
q = L«/2*J.  Then  insert  each  Aj  into  its  proper  place  among  the  B*,.” 

► 25.  [25]  Suppose  (xt])  is  an  m x n matrix  with  nondecreasing  rows  and  columns: 
Xij  < X(i+ i)j  for  1 < i < m and  Xij  < Xi(J+1)  for  1 < j < n.  Show  that  M(m,n)  is 
the  minimum  number  of  comparisons  needed  to  determine  whether  a given  number  x 
is  present  in  the  matrix,  if  all  comparisons  are  between  x and  some  matrix  element. 

*5.3.3.  Minimum-Comparison  Selection 

A similar  class  of  interesting  problems  arises  when  we  look  for  best  possible 
procedures  to  select  the  t th  largest  of  n elements. 

The  history  of  this  question  goes  back  to  Rev.  C.  L.  Dodgson’s  amusing 
(though  serious)  essay  on  lawn  tennis  tournaments,  which  appeared  in  St.  James's 
Gazette,  August  1,  1883,  pages  5-6.  Dodgson  — who  is  of  course  better  known 
as  Lewis  Carroll— ‘was  concerned  about  the  unjust  manner  in  which  prizes  were 
awarded  in  tennis  tournaments.  Consider,  for  example,  Fig.  39,  which  shows 
a typical  “knockout  tournament”  between  32  players  labeled  01,  02,  . . . , 32. 
In  the  finals,  player  01  defeats  player  05,  so  it  is  clear  that  player  01  is  the 
champion  and  deserves  the  first  prize.  The  inequity  arises  because  player  05 
usually  gets  second  prize,  although  someone  else  might  well  be  the  second  best. 
You  can  win  second  prize  even  if  you  are  worse  than  half  of  the  players  in  the 
competition!  In  fact,  as  Dodgson  observed,  the  second-best  player  wins  second 
prize  if  and  only  if  the  champion  and  the  next-best  are  originally  in  opposite 
halves  of  the  tournament;  this  occurs  with  probability  2n-1/(2"  — 1),  when  there 
are  2n  competitors,  so  the  wrong  player  receives  second  prize  almost  half  of  the 
time.  If  the  losers  of  the  semifinal  round  (players  25  and  1 7 in  Fig.  39)  compete 
for  third  prize,  it  is  highly  unlikely  that  the  third-best  player  receives  third  prize. 

Dodgson  therefore  set  out  to  design  a tournament  that  determines  the  true 
second-  and  third-best  players,  assuming  a transitive  ranking.  (In  other  words,  if 
player  A beats  player  B and  B beats  C,  Dodgson  assumed  that  A would  beat  C .) 
He  devised  a procedure  in  which  losers  are  allowed  to  play  further  games  until 
they  are  known  to  be  definitely  inferior  to  three  other  players.  An  example  of 
Dodgson’s  scheme  appears  in  Fig.  40,  which  is  a supplementary  tournament  to 
be  run  in  conjunction  with  Fig.  39.  He  tried  to  pair  off  players  whose  records  in 
previous  rounds  were  equivalent;  he  also  tried  to  avoid  matches  in  which  both 
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players  had  been  defeated  by  the  same  person.  In  this  particular  example,  16 
loses  to  11  and  13  loses  to  12  in  Round  1;  after  13  beats  16  in  the  second 
round,  we  can  eliminate  16 , who  is  now  known  to  be  inferior  to  11,  12,  and  13. 
In  Round  3 Dodgson  did  not  allow  19  to  play  with  21,  since  they  have  both 
been  defeated  by  18  and  we  could  not  automatically  eliminate  the  loser  of  19 
versus  21. 
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Round  4 
Round  3 
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Round  1 01  0703 10  02  08  04  09  25  28  26  27  29  32  30  31  05 15  06  14  1 1 16 12 13 17  24  20  23 18  19  21  22 
Fig.  39.  A knockout  tournament  with  32  players. 


It  would  be  nice  to  report  that  Lewis  Carroll’s  tournament  turns  out  to  be 
optimal,  but  unfortunately  that  is  not  the  case.  His  diary  entry  for  July  23. 
1883,  says  that  he  composed  the  essay  in  about  six  hours,  and  he  felt  “we  are 
now  so  late  in  the  [tennis]  season  that  it  is  better  it  should  appear  soon  than  be 
written  well.”  His  procedure  makes  more  comparisons  than  necessary,  and  it  is 
not  formulated  precisely  enough  to  qualify  as  an  algorithm.  On  the  other  hand,  it 
has  some  rather  interesting  aspects  from  the  standpoint  of  parallel  computation. 
And  it  appears  to  be  an  excellent  plan  for  a tennis  tournament,  because  he 
built  in  some  dramatic  effects;  for  example,  he  specified  that  the  two  finalists 
should  sit  out  round  5,  playing  an  extended  match  during  rounds  6 and  7.  But 
tournament  directors  presumably  thought  the  proposal  was  too  logical,  and  so 
Carroll’s  system  has  apparently  never  been  tried.  Instead,  a method  of  “seeding” 
is  used  to  keep  the  supposedly  best  players  in  different  parts  of  the  tree. 
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Fig.  40.  Lewis  Carroll’s  lawn  tennis  tournament  (played  in  conjunction  with  Fig.  39). 
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In  a mathematical  seminar  during  1929-1930,  Hugo  Steinhaus  posed  the 
problem  of  finding  the  minimum , number  of  tennis  matches  required  to  determine 
the  first  and  second  best  players  in  a tournament,  when  there  are  n > 2 players 
in  all.  J.  Schreier  [Mathesis  Polska  7 (1932),  154-160]  gave  a procedure  that 
requires  at  most  n — 2 + |~lg  n]  matches,  using  essentially  the  same  method  as  the 
first  two  stages  in  what  we  have  called  tree  selection  sorting  (see  Section  5.2.3, 
Fig.  23),  avoiding  redundant  comparisons  that  involve  — oo.  Schreier  also  claimed 
that  n — 2 + [lgn]  is  best  possible,  but  his  proof  was  incorrect,  as  was  another 
attempted  proof  by  J.  Slupecki  [ Colloquium  Mathematician  2 (1951),  286-290]. 
Thirty-two  years  went  by  before  a correct,  although  rather  complicated,  proof 
was  finally  published  by  S.  S.  Kislitsyn  [Sibirskii Mat.  Zhurnal  5 (1964),  557  564]. 

Let  Vt(n)  denote  the  minimum  number  of  comparisons  needed  to  determine 
the  fth  largest  of  n elements,  for  1 < t < n,  and  let  Wt(n ) be  the  minimum 
number  required  to  determine  the  largest,  second  largest,  . . . , and  the  ft.h  largest, 
collectively.  By  symmetry,  we  have 

Vt(n)  = Vn+1  _t(n),  (i) 

and  it  is  obvious  that 

V\{n)  = W\(n),  (2) 

Vt(n)  < Wt(n),  (3) 

Wn(n)  = Wn-i(n)  — S(n).  (4) 

We  have  observed  in  Lemma  5.2.3M  that 

Vi(n)  = n-1.  (5) 

In  fact,  there  is  an  astonishingly  simple  proof  of  this  fact,  since  everyone  in  a 
tournament  except  the  champion  must  lose  at  least  one  game!  By  extending  this 
idea  and  using  an  “adversary”  as  in  Section  5.3.2,  we  can  prove  the  Schreier- 
Kislitsyn  theorem  without  much  difficulty: 

Theorem  S.  V_2(n)  = W^in)  = n — 2 + [lgn],  for  n > 2. 

Proof.  Assume  that  n players  have  participated  in  a tournament  that  has 
determined  the  second-best  player  by  some  given  procedure,  and  let  (ij  be  the 
number  of  players  who  have  lost  j or  more  matches.  The  total  number  of  matches 
played  is  then  cq  + a2  + <13  + • • • . We  cannot  determine  the  second-best  player 
without  also  determining  the  champion  (see  exercise  2),  so  our  previous  argument 
shows  that  cq  = n — 1.  To  complete  the  proof,  we  will  show  that  there  is  always 
some  sequence  of  outcomes  of  the  matches  that  makes  a2  > [lgn]  — 1. 

Suppose  that  at  the  end  of  the  tournament  the  champion  has  played  (and 
beaten)  p players;  one  of  these  is  the  second  best,  and  the  others  must  have  lost 
at  least  one  other  time,  so  a2  > p — 1.  Therefore  we  can  complete  the  proof  by 
constructing  an  adversary  who  decides  the  results  of  the  games  in  such  a way 
that  the  champion  must  play  at  least  |"lgn]  other  people. 

Let  the  adversary  declare  A to  be  better  than  B if  A is  previously  undefeated 
and  B has  lost  at  least  once,  or  if  both  are  undefeated  and  B has  won  fewer 
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matches  than  A at  that  time.  In  other  circumstances  the  adversary  may  make 
an  arbitrary  decision  consistent  with  some  partial  ordering. 

Consider  the  outcome  of  a complete  tournament  whose  matches  have  been 
decided  by  such  an  adversary.  Let  us  say  that  “A  supersedes  Bv  if  and  only  if  A = 
B or  A supersedes  the  player  who  first  defeated  B.  (Only  a player’s  first  defeat 
is  relevant  in  this  relation;  a loser’s  subsequent  games  are  ignored.  According 
to  the  mechanism  of  the  adversary,  any  player  who  first  defeats  another  must 
be  previously  unbeaten.)  It  follows  that  a player  who  won  the  first  p matches 
supersedes  at  most  2l>  players  on  the  basis  of  those  p contests.  (This  is  clear 
for  p — 0,  and  for  p > 0 the  pth  match  was  against  someone  who  was  either 
previously  beaten  or  who  supersedes  at  most  2P~ 1 players.)  Hence  the  champion, 
who  supersedes  everyone,  must  have  played  at  least  [lg  n]  matches.  | 

Theorem  S completely  resolves  the  problem  of  finding  the  second-best  player, 
m the  minimax  sense.  Exercise  6 shows,  in  fact,  that  it  is  possible  to  give  a simple 
formula  for  the  minimum  number  of  comparisons  needed  to  find  the  second 
largest  element  of  a set  when  an  arbitrary  partial  ordering  of  the  elements  is 
known  beforehand. 

What  if  t > 2?  In  the  paper  cited  above,  Kislitsyn  went  on  to  consider  larger 
values  of  t,  proving  that 

Wt(n)  < n — t.  + £ flgjl,  for  n > t.  (6) 

n+1  — t<j<n 

For  t — 1 and  t—  2 we  have  seen  that  equality  actually  holds  in  this  formula; 
for  t = 3 it  can  be  slightly  improved  (see  exercise  21). 

We  shall  prove  Kislitsyn’s  theorem  by  showing  that  the  first  t stages  of  tree 
selection  require  at  most  n — t + ^2n+i-t<j<n  Rsil  comparisons,  ignoring  all  of 
the  comparisons  that  involve  -oo.  It  is  interesting  to  note  that,  by  Eq.  5.3.1  - (3), 
the  right-hand  side  of  (6)  equals  B(n)  when  t — n,  and  also  when  t = n — 1; 
hence  tree  selection  and  binary  insertion  yield  the  same  upper  bound  for  the 
sorting  problem,  although  they  are  quite  different  methods. 

Let  a be  an  extended  binary  tree  with  n external  nodes,  and  let  tv  be  a 
permutation  of  {1,2,... ,/),}.  Place  the  elements  of  tv  into  the  external  nodes, 
from  left  to  right  in  symmetric  order,  and  fill  in  the  internal  nodes  according  to 
the  rules  of  a knockout  tournament  as  in  tree  selection.  When  the  resulting  tree  is 
subjected  to  repeated  selection  operations,  it  defines  a sequence  c„_  1 cn_2  . . . ci, 
where  Cj  is  the  number  of  comparisons  required  to  bring  element  j to  the  root 
of  the  tree  when  element  j + 1 has  been  replaced  by  —00.  For  example,  if  a is 
the  tree 


(7) 
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If  7r  had  been  3 1 5 4 2,  the  sequence  c4  c3  c2  Ci  would  have  been  2110  instead. 
It  is  not  difficult  to  see  that  C\  is  always  zero. 

Let  n(a,  n)  be  the  multiset  {cn_j,  cn_2,  . • . , cx]  determined  by  a and  n.  If 


and  if  elements  1 and  2 do  not  both  appear  in  a'  or  both  in  a ",  it  is  easy  to  see 
that 

p(a,  tv)  = (n(a',  -k')  + l)  l±l  (p(a",  tv")  + 1)  l±)  {0}  (8) 

for  appropriate  permutations  tv'  and  n",  where  p+1  denotes  the  multiset  obtained 
by  adding  1 to  each  element  of  /i.  (See  exercise  7.)  On  the  other  hand,  if  elements 
1 and  2 both  appear  in  a',  we  have 

H{a,  tv)  = (n(a',  tv')  + e)  tti  (, n(a ",  tv")  + l)  l±)  {0}, 

where  //  + e denotes  a multiset  obtained  by  adding  1 to  some  elements  of  )i  and 
0 to  the  others.  A similar  formula  holds  when  1 and  2 both  appear  in  a".  Let  us 
say  that  multiset  // 1 dominates  // 2 if  both  / / 1 and  /i2  contain  the  same  number 
of  elements,  and  if  the  A:th  largest  element  of  /p  is  greater  than  or  equal  to  the 
fcth  largest  element  of  /t2  for  all  k;  and  let  us  define  fi(a)  to  be  the  dominant 
P(q,7t),  taken  over  all  permutations  n,  in  the  sense  that  p(a)  dominates  tv) 
for  all  tv  and  fi(a)  = /r(a,  tv)  for  some  tv.  The  formulas  above  show  that 


/*(□)=  0( 


(9) 


hence  p(a)  is  the  multiset  of  all  distances  from  the  root  to  the  internal  nodes  of  a. 

The  reader  who  has  followed  this  train  of  thought  will  now  see  that  we  are 
ready  to  prove  Kislitsyn’s  theorem  (6).  Indeed,  Wt(n)  is  less  than  or  equal  to 
n — 1 plus  the  t — 1 largest  elements  of  n(a)i  where  a is  any  tree  being  used 
in  tree  selection  sorting.  We  may  take  a to  be  the  complete  binary  tree  with 
n external  nodes  (see  Section  2. 3. 4. 5),  when 


/*(«)  = {LigiJ>Lig2J,...,Lig(n-i)j} 

= {fig2!  -1,  Tig 3]  -1, . . . , pgn]  -l}.  (10) 

Formula  (6)  follows  when  we  consider  the  t — 1 largest  elements  of  this  multiset. 
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Kislitsyn’s  theorem  gives  a good  upper  bound  for  Wt(n);  he  remarked  that 
^3(5)  = 6 < W3(5)  = 7,  but  he  was  unable  to  find  a better  bound  for  Vt(n)  than 
for  Wt(n).  A.  Hadian  and  M.  Sobel  discovered  a way  to  do  this  using  replacement 
selection  instead  of  tree  selection;  their  formula  [Univ.  of  Minnesota.  Dept,  of 
Statistics  Report  121  (1969)], 

Vt(n)  < n — t + (t  - l)[lg(ra  + 2 - t)],  n>t , (11) 

is  similar  to  Kislitsyn’s  upper  bound  for  Wt(n)  in  (6),  except  that  each  term  in 
the  sum  has  been  replaced  by  the  smallest  term. 

Hadian  and  Sobel’s  theorem  (11)  can  be  proved  by  using  the  following 
construction:  First  set  up  a binary  tree  for  a knockout  tournament  on  n — t + 2 
items.  (This  takes  n — t + 1 comparisons.)  The  largest  item  is  greater  than 
n — t + 1 others,  so  it  can’t  be  fth  largest.  Replace  it,  where  it  appears  at  an 
external  node  of  the  tree,  by  one  of  the  t — 2 elements  held  in  reserve,  and  find 
the  largest  element  of  the  resulting  n - 1 + 2;  this  requires  at  most  [ lg(n  + 2 - t)l 
comparisons,  because  we  need  to  recompute  only  one  path  in  the  tree.  Repeat 
this  operation  t — 2 times  in  all,  for  each  element  held  in  reserve.  Finally,  replace 
the  currently  largest  element  by  -00,  and  determine  the  largest  of  the  remaining 
n + 1 — t;  this  requires  at  most  |"lg(n  + 2 — t)]  — 1 comparisons,  and  it  brings 
the  fth  largest  element  of  the  original  set  to  the  root  of  the  tree.  Summing  the 
comparisons  yields  (11). 

In  relation  (11)  we  should  of  course  replace  t by  n + 1 - 1 on  the  right-hand 
side  whenever  n+l-t  gives  a better  value  (as  when  n = 6 and  t = 3).  Curiously, 
the  formula  gives  a smaller  bound  for  14(13)  than  it  does  for  14(13).  The  upper 
bound  in  (11)  is  exact  for  n < 6,  but  as  n and  t get  larger  it  is  possible  to  obtain 
much  better  estimates  of  Vt(n). 

For  example,  the  following  elegant  method  (due  to  David  G.  Doren)  can  be 
used  to  show  that  K4(8)  < 12.  Let  the  elements  be  Ai,...,A8;  first  compare 
X\ : A2  and  A3 : A4  and  the  two  winners,  and  do  the  same  to  X5  :X6  and  X7:XS 
and  their  winners.  Relabel  elements  so  that  X\  < A2  < A4  > X3,  X5  < X6  < 
X8  > A 7,  then  compare  A2:A6;  by  symmetry  assume  that  A2  < A6,  so  that  we 
have  the  configuration 


5 6 


(Now  Ai  and  Ag  are  out  of  contention  and  we  must  find  the  third  largest  of 
{A2, . . . , A7}.)  Compare  A2:  A7,  and  discard  the  smaller;  in  the  worst  case  we 
have  A2  < X7  and  we  must  find  the  third  largest  of 


This  can  be  done  in  V3 (5)  — 2 = 4 more  steps,  since  the  procedure  of  (11)  that 
achieves  14(5)  = 6 begins  by  comparing  two  disjoint  pairs  of  elements. 
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Table  1 

VALUES  OF  Vt(n)  FOR  SMALL  n 


n 

Ki(n) 

V2(n) 

Vs(n) 

V4(n) 

V5(n) 

Ve  (n) 

V7(n)  Vs{n)  Vg (n)  Vio(n) 

1 

0 

2 

1 

1 

3 

2 

3 

2 

4 

3 

4 

4 

3 

5 

4 

6 

6 

6 

4 

6 

5 

7 

8 

8 

7 

5 

7 

6 

8 

10 

10* 

10 

8 

6 

8 

7 

9 

11 

12 

12 

11 

9 7 

9 

8 

11 

12 

14 

14* 

14 

12  11  8 

10 

9 

12 

14* 

15 

16** 

16*. 

15  14*  12  9 

* Exercises  10 

12  give  constructions  that 

improve  on  Eq.  (11)  in  these  cases. 

'*  See  K.  Noshita,  Trans,  of  the  IECE  of  Japan  E59, 12  (December  1976),  17-18. 

Other  tricks  of  this  kind  can  be  used  to  produce  the  results  shown  in  Table  1; 
no  general  method  is  evident  as  yet.  The  values  listed  for  V4(9)  = 1/6 (9)  and 
Vs(10)  = V6(10)  were  proved  optimum  in  1996  by  W.  Gasarch,  W.  Kelly,  and 
W.  Pugh  [SIGAC'T  News  27,2  (June  1996),  88-96],  using  a computer  search. 

A fairly  good  lower  bound  for  the  selection  problem  when  t is  small  was 
obtained  by  David  G.  Kirkpatrick  [ JACM  28  (1981),  150-165]:  If  2 < t < 
(n  + l)/2,  we  have 


t-2 

Vt(n)  >n  + t-  3 + y^ 

3= 0 

In  his  Ph.D.  thesis  [U.  of  Toronto,  1974],  Kirkpatrick  also  proved  that 


lg 


n — t + 2 

t + j 


V3(n)  < n + 1 + 


lg 


n — 1 


+ 


(12) 


<T3) 


this  upper  bound  matches  the  lower  bound  (12)  for  lg  § « 74%  of  all  integers  n, 
and  it  exceeds  (12)  by  at  most  1.  Kirkpatrick’s  analysis  made  it  natural  to 
conjecture  that  equality  holds  in  (13)  for  all  n > 4,  but  Jutta  Eusterbrock  found 
the  surprising  counterexample  V 3 (22)  = 28  [Discrete  Applied  Math.  41  (1993), 
131-137],  Improved  lower  bounds  for  larger  values  of  t were  found  by  S.  W.  Bent 
and  J.  W.  John  (see  exercise  27): 


Vt{n)  > n + m-2\\fm),  m = 2 + |"  lg  j (n  + 1 - t)j  . (14) 

This  formula  proves  in  particular  that 

Va„{n)  > (l  + alg^  + (1  - a)lg  + 0(^/n).  (15) 
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A linear  method.  When  n is  odd  and  t = fn/2],  the  tth  largest  (and  tth 
smallest)  element  is  called  the  median.  According  to  (n),  we  can  find  the 
median  of  n elements  in  w |nlgn  comparisons;  but  this  is  only  about  twice  as 
fast  as  sorting,  even  though  we  are  asking  for  much  less  information.  For  several 
years,  concerted  efforts  were  made  by  a number  of  people  to  find  an  improvement 
over  (11)  when  / and  n are  large.  Finally  in  1971,  Manuel  Blum  discovered  a 
method  that  needed  only  0(n  log  log  n)  steps.  Blum’s  approach  to  the  problem 
suggested  a new  class  of  techniques,  which  led  to  the  following  construction  due 
to  R.  Rivest  and  R.  Tarjan  [J.  Comp,  and  Sys.  Sci.  7 (1973),  448-461]: 

Theorem  L.  If  n > 32  and  1 < t < n,  we  have  Vt{n)  < 15 n - 163. 

Proof.  The  theorem  is  trivial  when  n is  small,  since  Vt(n ) < S(n)  < 10n  < 
15n  — 163  for  32  < n < 210.  By  adding  at  most  13  dummy  — oo  elements,  we 
may  assume  that  n = 7(2 q + 1)  for  some  integer  q > 73.  The  following  method 
may  now  be  used  to  select  the  tth  largest: 

Step  1.  Divide  the  elements  into  2q  + 1 groups  of  seven  elements  each,  and  sort 
each  of  the  groups.  This  takes  at  most  13(2 q +1)  comparisons. 

Step  2.  Find  the  median  of  the  2q  + 1 median  elements  obtained  in  Step  1. 
and  call  it  x.  By  induction  on  q , this  takes  at  most  Vq+1(2 q + 1)  < 30g  - 148 
comparisons. 

Step  3.  The  n - 1 elements  other  than  x have  now  been  partitioned  into  three 
sets  (see  Fig.  41): 

4<?  + 3 elements  known  to  be  greater  than  x (Region  B); 

4g  + 3 elements  known  to  be  less  than  x (Region  C); 

6 q elements  whose  relation  to  x is  unknown  (Regions  A and  D). 

By  making  4 q additional  comparisons,  we  can  tell  exactly  which  of  the  elements 
in  regions  A and  D are  less  than  x.  (We  first  test  x against  the  middle  element 
of  each  triple.) 

Step  4.  We  have  now  found  r elements  greater  than  x and  n — 1 — r elements 
less  than  x , for  some  r.  If  t = r + 1,  x is  the  answer;  if  t < r + 1,  we  need 
to  find  the  tth  largest  of  the  r large  elements;  and  if  t > r + 1,  we  need  to 
find  the  (t-l-r)th  largest  of  the  n - 1 - r small  elements.  The  point  is  that 
r and  n — 1 r are  both  less  than  or  equal  to  10 q + 3 (the  size  of  regions  A 
and  D,  plus  either  B or  C).  By  induction  on  q this  step  therefore  requires  at 
most  15(10<?  + 3)  — 163  comparisons. 

The  total  number  of  comparisons  comes  to  at  most 

13(2g  + 1)  + 30g  - 148  + 4q  + 15(10g  + 3)  - 163  = 15(14g  - 6)  - 163. 

Since  we  started  with  at  least  14g  — 6 elements,  the  proof  is  complete.  | 

Theorem  L shows  that  selection  can  always  be  done  in  linear  time,  namely 
that  Vt(n)  = 0(n).  Of  course,  the  method  used  in  this  proof  is  rather  crude, 
since  it  throws  away  good  information  in  Step  4.  Deeper  study  of  the  problem 
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Region  A Region  B 


Fig.  41.  The  selection  algorithm  of  Rivest  and  Tarjan  (q  = 4). 

has  led  to  much  sharper  bounds;  for  example,  A.  Schonhage,  M.  Paterson,  and 
N.  Pippenger  [J.  Comp.  Sys.  Sci.  13  (1976),  184-199]  proved  that  the  maximum 
number  of  comparisons  required  to  find  the  median  is  at  most  3n  + 0(nlogn)3/4. 
See  exercise  23  for  a lower  bound  and  for  references  to  more  recent  results. 

The  average  number.  Instead  of  minimizing  the  maximum  number  of  compar- 
isons, we  can  ask  instead  for  an  algorithm  that  minimizes  the  average  number 
of  comparisons,  assuming  random  order.  As  usual,  the  minimean  problem  is 
considerably  harder  than  the  minimax  problem;  indeed,  the  minimean  problem 
is  still  unsolved  even  in  the  case  t — 2.  Claude  Picard  mentioned  the  problem  in 
his  book  Theorie  des  Questionnaires  (1965),  and  an  extensive  exploration  was 
undertaken  by  Milton  Sobel  [Univ.  of  Minnesota,  Dept,  of  Statistics  Reports 
113  and  114  (November  1968);  Revue  Frangaise  d’Automatique,  Informatique  et 
Recherche  Operationnelle  6,R-3  (December  1972),  23-68], 

Sobel  constructed  the  procedure  of  Fig.  42,  which  finds  the  second  largest 
of  six  elements  using  only  6|  comparisons  on  the  average.  In  the  worst  case, 
8 comparisons  are  required,  and  this  is  worse  than  1^(6)  = 7;  in  fact,  an 
exhaustive  computer  search  by  D.  Hoey  has  shown  that  the  best  procedure  for 
this  problem,  if  restricted  to  at  most  7 comparisons,  uses  6||  comparisons  on 
the  average.  Thus  no  procedure  that  finds  the  second  largest  of  six  elements  can 
be  optimum  in  both  the  minimax  and  the  minimean  senses  simultaneously. 

Let  V t(n)  denote  the  minimum  average  number  of  comparisons  needed  to 
find  the  <th  largest  of  n elements.  Table  2 shows  the  exact  values  for  small  n,  as 
computed  by  D.  Hoey. 

R.  W.  Floyd  discovered  in  1970  that  the  median  of  n elements  can  be  found 
with  only  + 0(n2/3logn)  comparisons,  on  the  average.  He  and  R.  L.  Rivest 
refined  this  method  a few  years  later  and  constructed  an  elegant  algorithm  to 
prove  that  _ 

V t{n)  < n + min(f,  n—t)  + 0(%/nlogn  ). 

(See  exercises  13  and  24.) 


(16) 
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Fig.  42.  A procedure  that  selects  the  second  largest  of  { AA . AA . A3,  AA . A 5 . AA } . using 
6 \ comparisons  on  the  average.  Each  “symmetrical”  branch  is  identical  to  its  sibling, 
with  names  permuted  in  some  appropriate  maimer.  External  nodes  contain  “j  jfc”  when 
Xj  is  known  to  be  the  second  largest  and  Xk  the  largest;  the  number  of  permutations 
leading  to  such  a node  appears  immediately  below  it. 

Using  another  approach,  based  on  a generalization  of  one  of  Sobel’s  construc- 
tions for  t = 2,  David  W.  Matula  [Washington  Univ.  Tech.  Report  AMCS-73-9 
(1973)]  showed  that 

Vt(n)  < n + t [lgt](ll  + lnlnn).  (17) 

Thus,  for  fixed  t the  average  amount  of  work  can  be  reduced  to  n + O(loglogn) 
comparisons.  An  elegant  lower  bound  on  V t{n)  appears  in  exercise  25. 

The  sorting  and  selection  problems  are  special  cases  of  the  much  more 
general  problem  of  finding  a permutation  of  n given  elements  that  is  consistent 
with  a given  partial  ordering.  A.  C.  Yao  [SICOMP  18  (1989),  679-689]  has 
shown  that,  if  the  partial  ordering  is  defined  by  an  acyclic  digraph  G on  n 
vertices  with  k connected  components,  the  minimum  number  of  comparisons 
necessary  to  solve  such  problems  is  always  0(lg(n!/T(G))  +n  — k),  in  both  the 
worst  case  and  on  the  average,  where  T(G ) is  the  total  number  of  permutations 
consistent  with  the  partial  ordering  (the  number  of  topological  sortings  of  G). 

EXERCISES 

1.  [15]  In  Lewis  Carroll’s  tournament  (Figs.  39  and  40),  why  was  player  13  elimi- 
nated in  spite  of  winning  in  Round  3? 
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Table  2 

MINIMUM  AVERAGE  COMPARISONS  FOR  SELECTION 


n 

Vi(n) 

V2(n) 

V3(n) 

V4(n) 

Vf(n) 

Ve(n) 

V7(n) 

1 

0 

2 

1 

1 

3 

2 

2- 

Z3 

2 

4 

3 

4 

4 

3 

5 

6 

7 

4 

5 

6 

5I7 

H 

7 149 
' 210 

513 

D15 

7— 

1 18 

0 509 
°630 

5 — 

D15 

7— 

‘ 18 

Q_32_ 

^ 105 

4 

0 509 
°630 

5 

7 149 

1 210 

6 

► 2.  [M25]  Prove  that  after  we  have  found  the  tth  largest  of  n elements  by  a sequence 
of  comparisons,  we  also  know  which  t — 1 elements  are  greater  than  it,  and  which  n — t 
elements  are  less  than  it. 

3.  [20]  Prove  that  Vt(n)  > Vt(n  - 1)  and  Wt(n)  > Wt{n  - 1),  for  1 < t < n. 

► 4.  [M25]  (F.  Fussenegger  and  H.  N.  Gabow.)  Prove  that  Wt(n)  >n  — t+  [lgn— ]. 
5.  [10]  Prove  that  W3(n)  < V3 (n)  + 1. 

► 6.  [M26]  (R.  W.  Floyd.)  Given  n distinct  elements  {X\, . . . , Xn}  and  a set  of 

relations  Xt  < Xj  for  certain  pairs  we  wish  to  find  the  second  largest  element. 

If  we  know  that  Xi  < Xj  and  X,  < Xk  for  j / k,  Xj  cannot  possibly  be  the  second 
largest,  so  it  can  be  eliminated.  The  resulting  relations  now  have  a form  such  as 

• — >>>- 

namely,  m groups  of  elements  that  can  be  represented  by  a multiset  {h  ,h,  . . . , km  }',  the 
jth  group  contains  lj  + 1 elements,  one  of  which  is  known  to  be  greater  than  the  others. 
For  example,  the  configuration  above  can  be  described  by  the  multiset  {0, 1,  2, 2, 3, 5}; 
when  no  relations  are  known  we  have  a multiset  of  n zeros. 

Let  f(li,l2,...  ,lm)  be  the  minimum  number  of  comparisons  needed  to  find  the 
second  largest  element  of  such  a partially  ordered  set.  Prove  that 

2 + f lg(2  1 + 2 2 + • ■ • -f-  2^™  )~| . 

[Hint:  Show  that  the  best  strategy  is  always  to  compare  the  largest  elements  of  the  two 
smallest  groups,  until  reducing  m to  unity;  use  induction  on  h + h + • ■ • + lm  + 2m.] 

7.  [M20]  Prove  (8). 

8.  [ M21  ] Kislitsyn’s  formula  (6)  is  based  on  tree  selection  sorting  using  the  complete 
binary  tree  with  n external  nodes.  Would  a tree  selection  method  based  on  some  other 
tree  give  a better  bound,  for  any  t and  n? 

► 9.  [20]  Draw  a comparison  tree  that  finds  the  median  of  five  elements  in  at  most  six 
steps,  using  the  replacement-selection  method  of  Hadian  and  Sobel  [see  (11)]. 

10.  [35]  Show  that  the  median  of  seven  elements  can  be  found  in  at  most  10  steps. 
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11.  [38]  (K.  Noshita.)  Show  that  the  median  of  nine  elements  can  be  found  in  at 
most  14  steps,  of  which  the  first  seven  are  identical  to  Doren’s  method. 

12.  [21]  (Hadian  and  Sobel.)  Prove  that  V3{n)  < V3(n  - 1)  + 2.  [Hint:  Start  by 
discarding  the  smallest  of  {X1,  X2,  X3,  X4}.] 

► 13.  [HM28]  (R.  W.  Floyd.)  Show  that  if  we  start  by  finding  the  median  element  of 
{.Yi, . . . , Xn-2/3},  using  a recursively  defined  method,  we  can  go  on  to  find  the  median 
of  {Ai, . . . ,Xn}  with  an  average  of  | n + 0(n2^3  logn)  comparisons. 

► 14.  [20]  (M.  Sobel.)  Let  Ut(n)  be  the  minimum  number  of  comparisons  needed  to 
find  the  t largest  of  n elements,  without  necessarily  knowing  their  relative  order.  Show 
that  1/2(5)  < 5. 

15.  [22]  (I.  Pohl.)  Suppose  that  we  are  interested  in  minimizing  space  instead  of  time. 
What  is  the  minimum  number  of  data  words  needed  in  memory  in  order  to  compute 
the  fth  largest  of  n elements,  if  each  element  fills  one  word  and  if  the  elements  are 
input  one  at  a time  into  a single  register? 

► 16.  [25]  (I.  Pohl.)  Show  that  we  can  find  both  the  maximum  and  the  minimum  of  a 
set  of  n elements,  using  at  most  |"§n]  - 2 comparisons;  and  the  latter  number  cannot 
be  lowered.  [Hint:  Any  stage  in  such  an  algorithm  can  be  represented  as  a quadruple 
( a,b,c,d ),  where  a elements  have  never  been  compared,  b have  won  but  never  lost, 
c have  lost  but  never  won,  d have  both  won  and  lost.  Construct  an  adversary.] 

17.  [20]  (R.  W.  Floyd.)  Show  that  it  is  possible  to  select,  in  order,  both  the  k largest 
and  the  l smallest  elements  of  a set  of  n elements,  using  at  most  |"|n]  — k — l + 
En+l  —k<j<n  I!  + En+l-|<)<n  fig j]  comparisons. 

18.  [M20]  If  groups  of  size  5,  not  7,  had  been  used  in  the  proof  of  Theorem  L,  what 
theorem  would  have  been  obtained? 

19.  [M42]  Extend  Table  2 to  n = 8. 

20.  [M47]  What  is  the  asymptotic  value  of  V2(n)  — n,  as  n — > 00? 

21.  [32]  (P.  V.  Ramanan  and  L.  Hyafil.)  Prove  that  Wt(2k  + 2k+1~t)  < 2k  + 2fc+1~‘  + 

1)(^  1)5  when  k T t /■  2 : also  show  that  equality  holds  for  infinitely  many 

k and  f,  because  of  exercise  4.  [Hint:  Maintain  two  knockout  trees  and  merge  their 
results  cleverly.] 

22.  [24]  (David  G.  Kirkpatrick.)  Show  that  when  4 • 2k  < n - 1 < 5 • 2k , the  upper 
bound  (11)  for  V3(n)  can  be  reduced  by  1 as  follows:  (i)  Form  four  knockout  trees  of 
size  2 . (ii)  Find  the  minimum  of  the  four  maxima,  and  discard  all  2k  elements  of  its 
tree,  (iii)  Using  the  known  information,  build  a single  knockout  tree  of  size  n - 1 - 2k . 
(iv)  Continue  as  in  the  proof  of  (11). 

23.  [M49]  What  is  the  asymptotic  value  of  Ufn/2i  (n),  asn->  00? 

24.  [HM40]  Prove  that  V t(n)  < n + t + O(Vnlogn)  for  t < \n/ 2],  Hint:  Show 
that  with  this  many  comparisons  we  can  in  fact  find  both  the  [ t - Vt  lnnjth  and 
\t  + \Jt  lnn]th  elements,  after  which  the  fth  is  easily  located. 

► 25.  [M35]  (W.  Cunto  and  J.  I.  Munro.)  Prove  that  Vt(n)  > n + t-2  when  t.  < \n/ 2], 
26.  [M32]  (A.  Schonhage,  1974.)  (a)  In  the  notation  of  exercise  14,  prove  that  Ut(n)  > 
min(2-f  Ut(n—  1),  2 + t/t_i(n  — 1))  for  n > 3.  [Hint:  Construct  an  adversary  by  reducing 
from  n to  n - 1 as  soon  as  the  current  partial  ordering  is  not  composed  entirely  of 
components  having  the  form  • or  • — • .]  (b)  Similarly,  prove  that 

Ut(n ) > min(2  + Ut(n  - 1),3  + [/t_i(n  - 1),3  + Ut(n  - 2)) 
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for  n > 5,  by  constructing  an  adversary  that  deals  with  components  • , • — • , , 

*^t>*  • (c)  Therefore  we  have  Ut(n)  > n + t + min([(n  — t)/2j,t)  — 3 for  1 < t < n/2. 
[The  inequalities  in  (a)  and  (b)  apply  also  when  V or  W replaces  U,  thereby  establishing 
the  optimality  of  several  entries  in  Table  1.] 

► 27.  [MS 4 ] A randomized  adversary  is  an  adversary  algorithm  that  is  allowed  to  flip 
coins  as  it  makes  decisions. 

a)  Let  A be  a randomized  adversary  and  let  Pr (l)  be  the  probability  that  A reaches 
leaf  l of  a given  comparison  tree.  Show  that  if  Pr(Z)  < p for  all  l,  the  height  of  the 
comparison  tree  is  > lg(l/p). 

b)  Consider  the  following  adversary  for  the  problem  of  selecting  the  tth  largest  of  n 
elements,  given  integer  parameters  q and  r to  be  selected  later: 

Al.  Choose  a random  set  T of  t elements;  all  (")  possibilities  are  equally  likely. 
(We  will  ensure  that  the  t — 1 largest  elements  belong  to  T.)  Let  5 = 
(1,  • ■ ■ , n}  \ T be  the  other  elements,  and  set  So  <—  S,  T0  T;  So  and  To  will 
represent  elements  that  might  become  the  tth  largest. 

A2.  While  |To|  > r,  decide  all  comparisons  x:y  as  follows:  If  a;  6 5 and  y 6 T,  say 
that  x < y.  If  x G S and  y £ S,  flip  a coin  to  decide,  and  remove  the  smaller 
element  from  So  if  it  was  in  So.  If  x £ T and  y £ T,  flip  a coin  to  decide,  and 
remove  the  larger  element  from  To  if  it  was  in  T0. 

A3.  As  soon  as  |To|  = r , partition  the  elements  into  three  classes  P,  Q,  R as  follows: 
If  | So | < q,  let  P = S,  Q = To,  R = T \ To.  Otherwise,  for  each  y £ To,  let 
C(y)  be  the  elements  of  S already  compared  with  y,  and  choose  yo  so  that 
|C(2/o)|  is  minimum.  Let  P = (S  \ S0)  U C(y0 ),  Q = (S0  \ C(y0))  U {j/0}, 
R = T\{y0}.  Decide  all  future  comparisons  x:y  by  saying  that  elements  of  P 
are  less  than  elements  of  Q,  and  elements  of  Q are  less  than  elements  of  R-, 
flip  a coin  when  x and  y are  in  the  same  class.  | 

Prove  that  if  1 < r < t and  if  |C(yo)|  < q — r at  the  beginning  of  step  A3,  each 
leaf  is  reached  with  probability  < (n+  1 - t)/( 2n~?(")).  Hint:  Show  that  at  least 
n — q coin  flips  are  made. 

c)  Continuing  (b),  show  that  we  have 

Vt(n ) > min(n  - 1 + (r  - 1)(<?+  1 - r),  n - q + lg((")/(n  + 1 - t )))  , 

for  all  integers  q and  r. 

d)  Establish  (14)  by  choosing  q and  r. 

*5.3.4.  Networks  for  Sorting 

In  this  section  we  shall  study  a constrained  type  of  sorting  that  is  particularly 
interesting  because  of  its  applications  and  its  rich  underlying  theory.  The  new 
constraint  is  to  insist  on  an  oblivious  sequence  of  comparisons,  in  the  sense  that 
whenever  we  compare  Kt  versus  Kj  the  subsequent  comparisons  for  the  case 
Kt  < Kj  are  exactly  the  same  as  for  the  case  Ki  > Kj , but  with  i and  j 
interchanged. 

Figure  43(a)  shows  a comparison  tree  in  which  this  homogeneity  condition  is 
satisfied.  Notice  that  every  level  has  the  same  number  of  comparisons,  so  there 
are  2m  outcomes  after  m comparisons  have  been  made.  But  n!  is  not  a power 
of  2;  some  of  the  comparisons  must  therefore  be  redundant,  in  the  sense  that 


Fig.  43.  (a)  An  oblivious  comparison  tree,  (b)  The  corresponding  network. 
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one  of  their  subtrees  can  never  arise  in  practice.  In  other  words,  some  branches 
of  the  tree  must  make  more  comparisons  than  necessary,  in  order  to  ensure  that 
all  of  the  corresponding  branches  of  the  tree  will  sort  properly. 

Since  each  path  from  top  to  bottom  of  such  a tree  determines  the  entire  tree, 
such  a sorting  scheme  is  most  easily  represented  as  a network ; see  Fig.  43(b). 
The  boxes  in  such  a network  represent  “comparator  modules”  that  have  two 
inputs  (represented  as  lines  coming  into  the  module  from  above)  and  two  outputs 
(represented  as  lines  leading  downward);  the  left-hand  output  is  the  smaller  of 
the  two  inputs,  and  the  right-hand  output  is  the  larger.  At  the  bottom  of  the 
network,  K[  is  the  smallest  of  {ATi,  K2,  K3,  K4},  K'2  the  second  smallest,  etc. 
It  is  not  difficult  to  prove  that  any  sorting  network  corresponds  to  an  oblivious 
comparison  tree  in  the  sense  above,  and  any  oblivious  tree  corresponds  to  a 
network  of  comparator  modules. 

Incidentally,  we  may  note  that  comparator  modules  are  fairly  easy  to  manu- 
facture, from  an  engineering  point  of  view.  For  example,  assume  that  the  lines 
contain  binary  numbers,  where  one  bit  enters  each  module  per  unit  time,  most 
significant  bit  first.  Each  comparator  module  has  three  states,  and  behaves  as 
follows: 


Time  t 

Time 

{t  + 1) 

State 

Inputs 

State 

Outputs 

0 

0 0 
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0 0 

0 

0 1 
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0 1 
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0 1 
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1 1 

0 

1 1 

1 

x y 

1 

x y 

2 

x y 

2 

y x 

Initially  all  modules  are  in  state  0 and  are  outputting  0 0.  A module  enters 
either  state  1 or  state  2 as  soon  as  its  inputs  differ.  Numbers  that  begin  to  be 
transmitted  at  the  top  of  Fig.  43(b)  at  time  t will  begin  to  be  output  at  the 
bottom,  in  sorted  order,  at  time  t + 3,  if  a suitable  delay  element  is  attached  to 
the  K[  and  K'4  lines. 
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Fig.  44.  Another  way  to  rep- 
resent the  network  of  Fig.  43, 
as  it  sorts  the  sequence  of  four 
numbers  (4, 1,3,2). 


In  order  to  develop  the  theory  of  sorting  networks  it  is  convenient  to  repre- 
sent them  in  a slightly  different  way,  illustrated  in  Fig.  44.  Here  numbers  enter  at 
the  left , and  comparator  modules  are  represented  by  vertical  connections  between 
two  lines;  each  comparator  causes  an  interchange  of  its  inputs,  if  necessary,  so 
that  the  larger  number  sinks  to  the  lower  line  after  passing  the  comparator.  At 
the  right  of  the  diagram  all  the  numbers  are  in  order  from  top  to  bottom. 
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Our  previous  studies  of  optimal  sorting  have  concentrated  on  minimizing 
the  number  of  comparisons,  with  little  or  no  regard  for  any  underlying  data 
movement  or  for  the  complexity  of  the  decision  structure  that  may  be  necessary. 
In  this  respect  sorting  networks  have  obvious  advantages,  since  the  data  can  be 
maintained  in  n locations  and  the  decision  structure  is  “straight  line”  — there 
is  no  need  to  remember  the  results  of  previous  comparisons,  since  the  plan  is 
immutably  fixed  in  advance.  Another  important  advantage  of  sorting  networks 
is  that  we  can  usually  overlap  several  of  the  operations,  performing  them  simul- 
taneously (on  a suitable  machine).  For  example,  the  five  steps  in  Figs.  43  and  44 
can  be  collapsed  into  three  when  simultaneous  nonoverlapping  comparisons  are 
allowed,  since  the  first  two  and  the  second  two  can  be  combined.  We  shall  exploit 
this  property  of  sorting  networks  later  in  this  section.  Thus  sorting  networks  can 
be  very  useful,  although  it  is  not  at  all  obvious  that  efficient  n-element  sorting 
networks  can  be  constructed  for  large  n;  we  may  find  that  many  additional 
comparisons  are  needed  in  order  to  keep  the  decision  structure  oblivious. 
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Fig.  45.  Making  (n+  l)-sorters  from  n-sorters:  (a)  insertion,  (b)  selection. 


There  are  two  simple  ways  to  construct  a sorting  network  for  n + 1 elements 
when  an  n-element  network  is  given,  using  either  the  principle  of  insertion  or 
the  principle  of  selection.  Figure  45(a)  shows  how  the  (n  + l)st  element  can 
be  inserted  into  its  proper  place  after  the  first  n elements  have  been  sorted; 
and  part  (b)  of  the  figure  shows  how  the  largest  element  can  be  selected  before 
we  proceed  to  sort  the  remaining  ones.  Repeated  application  of  Fig.  45(a)  gives 
the  network  analog  of  straight  insertion  sorting  (Algorithm  5.2. IS),  and  repeated 
application  of  Fig.  45(b)  yields  the  network  analog  of  the  bubble  sort  (Algorithm 
5.2.2B).  Figure  46  shows  the  corresponding  six-element  networks. 


(a)  (b) 


Fig.  46.  Network  analogs  of  elementary  internal  sorting  schemes,  obtained  by  applying 
the  constructions  of  Fig.  45  repeatedly:  (a)  straight  insertion,  (b)  bubble  sort. 
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Fig.  47.  With  parallelism,  straight  insertion  = bubble  sort! 

Notice  that  when  we  collapse  either  network  together  to  allow  simultaneous 
operations,  both  methods  actually  reduce  to  the  same  “triangular”  (2n  — 3)- 
stage  procedure  (Fig.  47). 

It  is  easy  to  prove  that  the  network  of  Figs.  43  and  44  will  sort  any  set 
of  four  numbers  into  order,  since  the  first  four  comparators  route  the  smallest 
and  the  largest  elements  to  the  correct  places,  and  the  last  comparator  puts  the 
remaining  two  elements  in  order.  But  it  is  not  always  so  easy  to  tell  whether  or 
not  a given  network  will  sort  all  possible  input  sequences;  for  example,  both 
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are  valid  4-element  sorting  networks,  but  the  proofs  of  their  validity  are  not  triv- 
ial. It  would  be  sufficient  to  test  each  n-element  network  on  all  nl  permutations 
of  n distinct  numbers,  but  in  fact  we  can  get  by  with  far  fewer  tests: 

Theorem  Z ( Zero-one  principle).  If  a network  with  n input  lines  sorts  all 
2n  sequences  of  Os  and  Is  into  nondecreasing  order,  it  will  sort  any  arbitrary 
sequence  of  n numbers  into  nondecreasing  order. 

Proof.  (This  is  a special  case  of  Bouricius’s  theorem,  exercise  5.3.1-12.)  If  f(x) 
is  any  monotonic  function,  with  /( x)  < f(y)  whenever  x < y,  and  if  a given 
network  transforms  {x±, . . . , xn)  into  (yi, — , yn ) , then  it  is  easy  to  see  that  the 
network  will  transform  (f(xi), ...,  f(xn))  into  (f{yi), f(yn )).  If  Vi  > Vi+i 
for  some  i,  consider  the  monotonic  function  / that  takes  all  numbers  < yi  into  0 
and  all  numbers  > yx  into  1;  this  defines  a sequence  (f(x i), . . . , f{xn))  of  Os  and 
Is  that  is  not  sorted  by  the  network.  Hence  if  all  0-1  sequences  are  sorted,  we 
have  yi  < Pi+i  for  1 < i < n.  | 

The  zero-one  principle  is  quite  helpful  in  the  construction  of  sorting  net- 
works. As  a nontrivial  example,  we  can  derive  a generalized  version  of  Batcher’s 
“merge  exchange”  sort  (Algorithm  5.2.2M).  The  idea  is  to  sort  m + n elements  by 
(i)  sorting  the  first  m and  the  last  n independently,  then  (ii)  applying  an  ( m,n )- 
merging  network  to  the  result.  An  (m,  n)-merging  network  can  be  constructed 
inductively  as  follows: 

a)  If  m — 0 or  n — 0,  the  network  is  empty.  If  m — n = 1,  the  network  is  a 
single  comparator  module. 

b)  If  mn  > 1,  let  the  sequences  to  be  merged  be  (aq, . . . , xrn)  and  (y i, . . . , yn). 
Merge  the  “odd  sequences”  (aq,  x3,  • . • , x2\m/2\-i)  and  {yu  y3, . . . , 2/2|>/2i;«l)) 


224  SORTING 


5.3.4 


-a  ff1-^ 

• vi  - 

Zl 

+j  ) * 

' 

1 

■ — 

I 

1 

<v  I y 
-e  ) ija 

- / 

0 

’ — Z8 

0 . 

[Z 

, VQ  



1 ^11 

Fig.  48.  The  odd-even  merge,  when  m = 4 and  n = 7. 

obtaining  the  sorted  result  (iq,  v2,  ...,  Vfm/2l  + fn/2])i  also  merge  the  “even 
sequences”  (x2,  x4, . . . , £2[m/2j ) and  (y2,  y4, . . . , y2yn/2\),  obtaining  the  sorted 
result  (w i,  w2,  ...,  wyrn/ 2j  + |_n/2j)-  Finally,  apply  the  comparison-interchange 
operations 

WX-.V2,  w2:v3,  w3:vA,  ...,  wLm/2J  + Ln/2J  :v*  (i) 

to  the  sequence 

(v1,w1,v2,w2,  V3,  W3,...,  V|_m/2J  + |n/2J , W|_m/2j  + (_n/2j , V*,V**)\  (2) 

the  result  will  be  sorted(!).  Here  v*  = v^m/2 j + [n/2j+j  does  not  exist  if  both  m 
and  n are  even,  and  v**  — 7'Lm/2j  + Ln/2j+2  does  not  exist  unless  both  m and  n are 
odd;  the  total  number  of  comparator  modules  indicated  in  (1)  is  |_(77?.  + n-  l)/2j. 

Batcher’s  (m,  7r)-merging  network  is  called  the  odd-even  merge.  A (4,  7)-merge 
constructed  according  to  these  principles  is  illustrated  in  Fig.  48. 

To  prove  that  this  rather  strange  merging  procedure  actually  works,  when 
mn  > 1,  we  use  the  zero-one  principle,  testing  it  on  all  sequences  of  Os  and  Is. 
After  the  initial  m-sort  and  7i-sort,  the  sequence  {sq, . . .,xm)  will  consist  of  k 
Os  followed  by  m-k  Is,  and  the  sequence  (jq, . . . , yn)  will  be  l Os  followed  by 
n-l  Is,  for  some  k and  l.  Hence  the  sequence  {vi,v2, . . . ) will  consist  of  exactly 
\k/2\  + [// 2]  Os,  followed  by  Is;  and  (w\,  w2, . . . ) will  consist  of  [k/ 2j  + [l/2\ 
Os,  followed  by  Is.  Now  here’s  the  point: 

(r*/2l  + \ll 21)  - (Lfc/2J  + [1/ 2j)  = 0,  1,  or  2.  (3) 

If  this  difference  is  0 or  1,  the  sequence  (2)  is  already  in  order,  and  if  the 
difference  is  2 one  of  the  comparison-interchanges  in  (1)  will  fix  everything  up. 
This  completes  the  proof.  (Note  that  the  zero-one  principle  reduces  the  merging 
problem  from  a consideration  of  (m+n)  cases  to  only  (m  + l)(n  + 1),  represented 
by  the  two  parameters  k and  /.) 

Let  C(m,n)  be  the  number  of  comparator  modules  used  in  the  odd-even 
merge  for  m and  n,  not  counting  the  initial  m-sort  and  n-sort;  we  have 

C(  if  ran  < 1; 

l C{\m/2l  [n/2])+C(|_m/2j,  |_n/2j)+  [(m+7i  — l)/2j , if  mn  > 1. 

(4) 
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This  is  not  an  especially  simple  function  of  m,  and  n,  in  general,  but  by  noting 
that  C(l,n)  = n and  that 

C(m  + l,n  + 1)  — C(m,n ) 

= 1 + C(|m/2J  + 1,  [n/2j  + l)  - C([m/2j,  \n/ 2j),  if  mn  > 1, 
we  can  derive  the  relation 

C(m  + l,n  + 1)  — C(m,n)  — |_lgmj  + 2 + [_n / 2 Llg + 1 J , if  n > m > 1.  (5) 
Consequently 

C(m,m  + r)  = B(m)  + m + Rm(r),  for  m > 0 and  r > 0,  (6) 

where  B(rri)  is  the  “binary  insertion”  function  ^fcLiRg^l  of  Eq.  5 . 3 . 1- ( 3 ) - and 
where  Rm{r)  denotes  the  sum  of  the  first  m terms  of  the  series 

I r + 0 


I11  particular,  when  r = 0we  have  the  important  special  case 

C(m,m)  = B(m)  + m.  (8) 

Furthermore  if  f = [dgm], 

R-m{r  + 2()  = Rm(r ) + 1 ■ 2*  1 + 2 ■ 2*  ^ + •••  + 2*  1'2°  + ?n 
= Rm(r)  + m + t ■ 2t~l . 

Hence  C(m,  n + 2*)  — C(m,  n)  has  a simple  form,  and 

C(m,  n)  = ^ n + 0(1),  for  m fixed,  n — > 00,  t = [lgm];  (9) 

the  0(1)  term  is  an  eventually  periodic  function  of  n,  with  period  length  2*.  As 
n — 00  we  have  C(n,n)  = nlgn  + O(n),  by  Eq.  (8)  and  exercise  5.3.1-15. 

Minimum-comparison  networks.  Let  S(n)  be  the  minimum  number  of 
comparators  needed  in  a sorting  network  for  n elements;  clearly  S(n)  > S(n), 
where  S(n)  is  the  minimum  number  of  comparisons  needed  in  a not-necessarily- 
oblivious  sorting  procedure  (see  Section  5.3.1).  We  have  5(4)  = 5 = 5(4),  so 
the  new  constraint  causes  no  loss  of  efficiency  when  n = 4;  but  already  when 
n = 5 it  turns  out  that  5(5)  = 9 while  5(5)  = 7.  The  problem  of  determining 
5(n)  seems  to  be  even  harder  than  the  problem  of  determining  5(n);  even  the 
asymptotic  behavior  of  5(n)  is  known  only  in  a very  weak  sense. 

It  is  interesting  to  trace  the  history  of  this  problem,  since  each  step  was 
forged  with  some  difficulty.  Sorting  networks  were  first  explored  by  P.  N.  Arm- 
strong, R.  J.  Nelson,  and  D.  G.  O’Connor,  about  1954  [see  U.S.  Patent  3029413 ]; 
in  the  words  of  their  patent  attorney,  “By  the  use  of  skill,  it  is  possible  to 
design  economical  n-line  sorting  switches  using  a reduced  number  of  two-line 
sorting  switches.”  After  observing  that  5(n  + 1)  < 5(n)  + n,  they  gave  special 
constructions  for  4 < n < 8,  using  5,  9,  12,  18,  and  19  comparators,  respectively. 
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Then  Nelson  worked  together  with  R.  C.  Bose  to  show  that  S( 2")  < 3n  — 2n 
for  all  n;  hence  S(n)  = 0(nlg3)  = 0(n1585).  Bose  and  Nelson  published  their 
interesting  method  in  JACM  9 (1962),  282  -296,  where  they  conjectured  that  it 
was  best  possible;  T.  N.  Hibbard  [JACM  10  (1963),  142  150]  found  a similar 
but  slightly  simpler  construction  that  used  the  same  number  of  comparisons, 
thereby  reinforcing  the  conjecture. 

In  1964,  R.  W.  Floyd  and  D.  E.  Knuth  found  a new  way  to  approach  the 
problem,  leading  to  an  asymptotic  bound  of  the  form  S(n)  = 0(n1+c^l°sn  ). 
Working  independently,  K.  E.  Batcher  discovered  the  general  merging  strategy 
outlined  above.  Using  a number  of  comparators  defined  by  the  recursion 

c(l)=0,  c(n)  = c([~n/2])  + c(|_n/2_|)  + C(["n/2],  [_n/2j)  for  n > 2,  (10) 

he  proved  (see  exercise  5.2.2-14)  that 

c(2*)  = (t2  - t + 4)2f“2  - 1; 

consequently  S(n)  = 0(n(logn)2).  Neither  Floyd  and  Knuth  nor  Batcher  pub- 
lished their  constructions  until  some  time  later  [Notices  of  the  Amer.  Math.  Soc. 
14  (1967),  283;  Proc.  AFIPS  Spring  Joint  Computer  Conf.  32  (1968),  307-314]. 

Several  people  have  found  ways  to  reduce  the  number  of  comparators  used 
by  Batcher’s  merge-exchange  construction;  the  following  table  shows  the  best 
upper  bounds  currently  known  for  S(n): 

n = 1 2 3 4 5 6 7 8 9 10  11  12  13  14  15  16 

c(n)  = 0 1 3 5 9 12  16  19  26  31  37  41  48  53  59  63  (n) 

S(n)  < 0 1 3 5 9 12  16  19  25  29  35  39  45  51  56  60 

Since  S(n)  < c(n ) for  8 < n < 16,  merge  exchange  is  nonoptimal  for  all  n > 8. 

When  n < 8,  merge  exchange  uses  the  same  number  of  comparators  as  the 
construction  of  Bose  and  Nelson.  Floyd  and  Knuth  proved  in  1964  1966  that 
the  values  listed  for  S(n)  are  exact  when  n < 8 [see  A Survey  of  Combinatorial 
Theory  (North-Holland,  1973),  163-172];  the  values  of  S(n)  for  n > 8 are  still 
not  known. 

Constructions  that  lead  to  the  values  in  (n)  are  shown  in  Fig.  49.  The 
network  for  n = 9,  based  on  an  interesting  three-way  merge,  was  found  by  R.  W. 
Floyd  in  1964;  its  validity  can  be  established  by  using  the  general  principle 
described  in  exercise  27.  The  network  for  n = 10  was  discovered  by  A.  Waksman 
in  1969,  by  regarding  the  inputs  as  permutations  of  {1,2,...,  10}  and  trying  to 
reduce  as  much  as  possible  the  number  of  values  that  can  appear  on  each  line  at 
a given  stage,  while  maintaining  some  symmetry. 

The  network  shown  for  n = 13  has  quite  a different  pedigree:  Hugues  Juille 
[Lecture  Notes  in  Comp.  Sci.  929  (1995),  246-260]  used  a computer  program 
to  construct  it,  by  simulating  an  evolutionary  process  of  genetic  breeding.  The 
network  exhibits  no  obvious  rhyme  or  reason,  but  it  works  - and  it’s  shorter 
than  any  other  construction  devised  so  far  by  human  ratiocination. 

A 62-comparator  sorting  network  for  16  elements  was  found  by  G.  Shapiro 
in  1969,  and  this  was  rather  surprising  since  Batcher’s  method  (63  comparisons) 


n = 16  60  modules,  delay  10 


Fig.  49.  Efficient  sorting  networks. 

would  appear  to  be  at  its  best  when  n is  a power  of  2.  Soon  after  hearing  of 
Shapiro’s  construction,  M.  W.  Green  tripled  the  amount  of  surprise  by  finding 
the  60-comparison  sorter  in  Fig.  49.  The  first  portion  of  Green’s  construction 
is  fairly  easy  to  understand;  after  the  32  comparison/interchanges  to  the  left  of 
the  dotted  line  have  been  made,  the  lines  can  be  labeled  with  the  16  subsets  of 
{a,  b,  c,  d\ , in  such  a way  that  the  line  labeled  s is  known  to  contain  a number  less 
than  or  equal  to  the  contents  of  the  line  labeled  t whenever  s is  a subset  of  t.  The 
state  of  the  sort  at  this  point  is  discussed  further  in  exercise  32.  Comparisons 
made  on  subsequent  levels  of  Green’s  network  become  increasingly  mysterious, 
however,  and  as  yet  nobody  has  seen  how  to  generalize  the  construction  in  order 
to  obtain  correspondingly  efficient  networks  for  higher  values  of  n. 

Shapiro  and  Green  also  discovered  the  network  shown  for  n — 12.  When 
n — 11,  14,  or  15,  good  networks  can  be  found  by  removing  the  bottom  line  of 
the  network  for  n + 1,  together  with  all  comparators  touching  that  line. 
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The  best  sorting  network  currently  known  for  256  elements,  due  to  D Van 
Voorhis,  shows  that  S(256)  < 3651,  compared  to  3839  by  Batcher’s  method. 
[See  R.  L.  Drysdale  and  F.  H.  Young,  SICOMP  4 (1975),  264  270.]  As  n ->  oc, 
it  turns  out  in  fact  that  S(n ) = 0(n  log  n);  this  astonishing  upper  bound  was 
proved  by  Ajtai,  Komlos,  and  Szemeredi  in  Combinatorica  3 (1983),  1-19.  The 
networks  they  constructed  are  not  of  practical  interest,  since  many  comparators 
were  introduced  just  to  save  a factor  of  log  n;  Batcher’s  method  is  much  better, 
unless  n exceeds  the  total  memory  capacity  of  all  computers  on  earth!  But  the 
theorem  of  Ajtai,  Komlos,  and  Szemeredi  does  establish  the  true  asymptotic 
growth  rate  of  S(n),  up  to  a constant  factor. 

Minimum-time  networks.  In  physical  realizations  of  sorting  networks,  and 
on  parallel  computers,  it  is  possible  to  do  nonoverlapping  comparison-exchanges 
at  the  same  time;  therefore  it  is  natural  to  try  to  minimize  the  delay  time.  A 
moment’s  reflection  shows  that  the  delay  time  of  a sorting  network  is  equal  to 
the  maximum  number  of  comparators  in  contact  with  any  “path”  through  the 
network,  if  we  define  a path  to  consist  of  any  left-to-right  route  that  possibly 
switches  lines  at  the  comparators.  We  can  put  a sequence  number  on  each 
comparator  indicating  the  earliest  time  it  can  be  executed;  this  is  one  higher  than 
the  maximum  of  the  sequence  numbers  of  the  comparators  that  occur  earlier  on 
its  input  lines.  (See  Fig.  50(a);  part  (b)  of  the  figure  shows  the  same  network 
redrawn  so  that  each  comparison  is  done  at  the  earliest  possible  moment.) 


Fig.  50.  Doing  each  comparison  at  the  earliest  possible  time. 


Batcher’s  odd-even  merging  network  described  above  takes  TB(m,n)  units 
of  time,  where  TB(m,  0)  = Ts(0,n)  = 0,  TB(  1, 1)  = 1,  and 

Ts(m,n)  = 1 + max(Tg( [m/2j , |_n/2_|),  TB(\m/2],  \n/2]))  for  mn  > 2. 

We  can  use  these  relations  to  prove  that  TB(m,n+ 1)  > TB(m,n),  by  induction; 
hence  TB(m,n ) = 1 +TB(\m/2],  [n/2])  for  mn  > 2,  and  it  follows  that 

TB(m,n)  = 1 + |"lgmax(m,n)"|,  for  mn  > 1.  (12) 

Exercise  5 shows  that  Batcher’s  sorting  method  therefore  has  a delay  time  of 

(‘n1'"1)-  C3) 

Let  T(n)  be  the  minimum  achievable  delay  time  in  any  sorting  network  for 
n elements.  It  is  possible  to  improve  some  of  the  networks  described  above  so 


71=10 


31  modules,  delay  7 


n = ll  35  modules,  delay  8 


n = 16  61  modules,  delay  9 

Fig.  51.  Sorting  networks  that  are  the  fastest  known,  when  comparisons  are  performed 
in  parallel. 

that  they  have  smaller  delay  time  but  use  no  more  comparators,  as  shown  for 
n — 6,  n = 9,  and  n — 11  in  Fig.  51,  and  for  n = 10  in  exercise  7.  Still  smaller 

delay  time  can  be  achieved  if  we  add  one  or  two  extra  comparator  modules,  as 

shown  in  the  remarkable  networks  for  n = 10,  12,  and  16  in  Fig.  51.  These 
constructions  yield  the  following  upper  bounds  on  T(n)  for  small  n: 

n = 1 2 3 4 5 6 7 8 9 10  11  12  13  14  15  16 

T(n)  <013355667  7 8 8 9 9 9 9 ^ 

For  n < 10  the  values  given  here  are  known  to  be  exact  (see  exercise  4).  The 
networks  in  Fig.  51  merit  careful  study,  because  it  is  by  no  means  obvious 
that  they  always  sort.  Some  of  these  networks  were  discovered  in  1969-1971 
by  G.  Shapiro  (n  = 6,  12)  and  D.  Van  Voorhis  ( n = 10,  16);  the  others  were 
found  in  2001  by  Loren  Schwiebert,  using  genetic  methods  (n  = 9,  11). 
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Merging  networks.  Let  M(m,n)  denote  the  minimum  number  of  comparator 
modules  needed  in  a network  that  merges  m elements  xx  < - ■■<  xm  with  n 
elements  2/i  < ■ • • < yn  to  form  the  sorted  sequence  z1  < ■ ■ ■ < zm+n.  At  present 
no  merging  networks  have  been  discovered  that  are  superior  to  the  odd-even 
merge  described  above;  hence  the  function  C(m,n)  in  (6)  represents  the  best 
upper  bound  known  for  M(m,n). 

R.  W.  Floyd  has  discovered  an  interesting  way  to  find  lower  bounds  for  this 
merging  problem. 

Theorem  F.  For  all  n > 1,  we  have  M(2n,  2 n)  > 2 M(n,  n)  + n. 

Proof.  Consider  a network  with  M (2 n,  2n)  comparator  modules,  capable  of 
sorting  all  input  sequences  (zi, . . . , z4n)  such  that  Zi  < z3  < ■ ■ ■ < z4n_1  and 
Z2  < z4  < ■ ■ ■ < z4n.  We  may  assume  that  each  module  replaces  (z^zj)  by 
(min(zj, Zj),  max(zi,  Zj)) , for  some  i < j (see  exercise  16).  The  comparators  can 
therefore  be  divided  into  three  classes: 

a)  i < 2n  and  j < 2 n. 

b)  i > 2n  and  j > 2 n. 

c)  i < 2 n and  j > 2 n. 

Class  (a)  must  contain  at  least  M(n,  n ) comparators,  since  z2n+i,  zon+2,  • • • , z4n 
may  be  already  in  their  final  position  when  the  merge  starts;  similarly,  there 
are  at  least  M(n,n)  comparators  in  class  (b).  Furthermore  the  input  sequence 
(0, 1, 0, 1, ... , 0, 1)  shows  that  class  (c)  contains  at  least  n comparators,  since  n 
zeros  must  move  from  {z2n+i,  • • • , z4n}  to  {zlr . . . , z2n}.  | 

Repeated  use  of  Theorem  F proves  that  M(2m,2m)  > ±(m  + 2)2m;  hence 
M(n,  n)  > |nlg n + 0(n).  We  know  from  Theorem  5.3.2M  that  merging  without 
the  network  restriction  requires  only  M(n,  n)  = 2n  — 1 comparisons;  hence  we 
have  proved  that  merging  with  networks  is  intrinsically  harder  than  merging  in 
general. 

The  odd-even  merge  shows  that 

M(m,  n)  < C(m,  n)  — |(m  + n)  lgmin(m,  n)  + 0(m  + n). 

R B.  Miltersen,  M.  Paterson,  and  J.  Tarui  [JACM  43  (1996),  147-165]  have 
improved  Theorem  F by  establishing  the  lower  bound 

M(m,  n)  > \{{rn  + n ) lg(m  + 1)  - m/ln2)  for  1 < m < n. 

Consequently  M(m,n ) = \ (m  + n)  lgmin(m,  n)  + 0{m  + n). 

The  exact  formula  M(2,n)  = C(2,n)  = \\n\  has  been  proved  by  A.  C.  Yao 
and  F.  F.  Yao  [JACM  23  (1976),  566-571].  The  value  of  M(m,  n)  is  also  known 
to  equal  C(m,  n ) for  m = n < 5;  see  exercise  9. 

Bitonic  sorting.  When  simultaneous  comparisons  are  allowed,  we  have  seen 
in  Eq.  (12)  that  the  odd-even  merge  uses  [lg(2n)]  units  of  delay  time,  when 
1 < to  < n.  Batcher  has  devised  another  type  of  network  for  merging,  called  a 
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Fig.  52.  Batcher’s  bitonic  sorter  of  order  7. 


bitonic  sorter , which  lowers  the  delay  time  to  [lg(m  + n)l  although  it  requires 
more  comparator  modules.  [See  U.S.  Patent  3428946  (1969).] 

Let  us  say  that  a sequence  {z\, . . . , zp)  of  p numbers  is  bitonic  if  z\  '>■■■> 
zk  < ■ • ■ < zp  for  some  k,  1 < k < p.  (Compare  this  with  the  ordinary  definition 
of  “monotonic”  sequences.)  A bitonic  sorter  of  order  p is  a comparator  network 
that  is  capable  of  sorting  any  bitonic  sequence  of  length  p into  nondecreasing 
order.  The  problem  of  merging  aq  < • • • < xm  with  t/i  < • • • < yn  is  a special 
case  of  the  bitonic  sorting  problem,  since  merging  can  be  done  by  applying  a 
bitonic  sorter  of  order  rn  + n to  the  sequence  (xm, . . . , xi,  yi, . .. , yn ). 

Notice  that  when  a sequence  (zi, . . . , zp)  is  bitonic,  so  are  all  of  its  sub- 
sequences. Shortly  after  Batcher  discovered  the  odd-even  merging  networks,  he 
observed  that  we  can  construct  a bitonic  sorter  of  order  p in  an  analogous  way, 
by  first  sorting  the  bitonic  subsequences  {zi,  z3,  z5, . , . ) and  (z2,  z4,  zq,  . . . } inde- 
pendently, then  comparing  and  interchanging  zi : z2,  z3 : Z4,  , . . . (See  exercise  10 
for  a proof.)  If  C'(p)  is  the  corresponding  number  of  comparator  modules,  we 
have 

C'(p)  = C"(("p/2])  + C'(Lp/2j)  + [p/2j,  for  p > 2;  (15) 

and  the  delay  time  is  clearly  [lgp] . Figure  52  shows  the  bitonic  sorter  of  order  7 
constructed  in  this  way:  It  can  be  used  as  a (3,4)-  as  well  as  a (2,  5)-merging 
network,  with  three  units  of  delay;  the  odd-even  merge  for  m = 2 and  n = 5 
saves  one  comparator  but  adds  one  more  level  of  delay. 

Batcher’s  bitonic  sorter  of  order  2 * is  particularly  interesting;  it  consists  of 
t levels  of  2t_1  comparators  each.  If  we  number  the  input  lines  z0,  Zi, . . . , z2‘- 1, 
element  z*  is  compared  to  Zj  on  level  l if  and  only  if  i and  j differ  only  in  the 
l th  most  significant  bit  of  their  binary  representations.  This  simple  structure 
leads  to  parallel  sorting  networks  that  are  as  fast  as  merge  exchange,  Algorithm 
5.2.2M,  but  considerably  easier  to  implement.  (See  exercises  11  and  13.) 

Bitonic  merging  is  optimum,  in  the  sense  that  no  parallel  merging  method 
based  on  simultaneous  disjoint  comparisons  can  sort  in  fewer  than  [lg(m  + n)] 
stages,  whether  it  works  obliviously  or  not.  (See  exercise  46.)  Another  way  to 
achieve  this  optimum  time,  with  fewer  comparisons  but  a slightly  more  compli- 
cated control  logic,  is  discussed  in  exercise  57. 

When  1 < m < n,  the  nth  smallest  output  of  an  (m,  n)-merging  network 
depends  on  2 m + [m<n]  of  the  inputs  (see  exercise  29).  If  it  can  be  computed 
by  comparators  with  l levels  of  delay,  it  involves  at  most  2l  of  the  inputs;  hence 
2l  > 2m  + [m<?r],  and  l > [lg(2m  + [?n<n])].  Batcher  has  shown  [Report 
GER-14122  (Akron,  Ohio:  Goodyear  Aerospace  Corporation,  1968)]  that  this 


232 


SORTING 


5.3.4 


xi 


y i 

V2 

2/3 

Vi 

2/5 

2/6 


< 


< 


< 


< 


: 

I 

I 

I 

I 


I 

I 

I 

I 

I 


21 

22 

23 

24 

25 


26 

27 


Fig.  53.  Merging  one  item  with  six  others,  with  multiple  fanout,  in  order  to  achieve 
the  minimum  possible  delay  time. 


minimum  delay  time  is  achievable  if  we  allow  “multiple  fanout”  in  the  network, 
namely  the  splitting  of  lines  so  that  the  same  number  is  fed  to  many  modules 
at  once.  For  example,  one  of  his  networks,  capable  of  merging  one  item  with  n 
others  after  only  two  levels  of  delay,  is  illustrated  for  n — 6 in  Fig.  53.  Of  course, 
networks  with  multiple  fanout  do  not  conform  to  our  conventions,  and  it  is  fairly 
easy  to  see  that  any  (1,  n)-merging  network  without  multiple  fanout  must  have 
a delay  time  of  lg(n  + 1)  or  more.  (See  exercise  45.) 

Selection  networks.  We  can  also  use  networks  to  approach  the  problem  of 
Section  5.3.3.  Let  Ut(n ) denote  the  minimum  number  of  comparators  required 
in  a network  that  moves  the  t largest  of  n distinct  inputs  into  t specified  output 
lines^  the  numbers  are  allowed  to  appear  in  any  order  on  these  output  lines. 
Let  Vt(n)  denote  the  minimum  number  of  comparators  required  to  move  the  fth 
largest  of  n distinct  inputs  into  a specified  output  line;  and  let  Wt(n)  denote  the 
minimum  number  of  comparators  required  to  move  the  t largest  of  n distinct 
inputs  into  t specified  output  lines  in  nondecreasing  order.  It  is  not  difficult  to 
deduce  (see  exercise  17)  that 

Ut{n)  < Vt(n ) < Wt(n).  (16) 

Suppose  first  that  we  have  2 1 elements  (aq, . . . , x2 t)  and  we  wish  to  select  the 
largest  t.  V.  E.  Alekseev  [ Kibernetika  5,  5 (1969),  99-103]  has  observed  that  we 
can  do  the  job  by  first  sorting  (aq,  ,..,xt)  and  (xt+1, . . . , x2t),  then  comparing 
and  interchanging 

X\'.x2 1,  aq:x2t_i,  ...,  xt:xt+\.  (17) 

Since  none  of  these  pairs  can  contain  more  than  one  of  the  largest  t elements 
(why?),  Alekseev’s  procedure  must  select  the  largest  t elements. 

If  we  want  to  select  the  t largest  of  nt  elements,  we  can  apply  Alekseev’s 
procedure  n - 1 times,  eliminating  t elements  each  time;  hence 

Ut(nt)  < (n  - 1) (25(f)  + t).  (18) 
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Fig.  54.  Separating  the  largest  four  from  the  smallest  four.  (Numbers  on  these  lines 
are  used  in  the  proof  of  Theorem  A.) 

Alekseev  also  derived  an  interesting  lower  bound  for  the  selection  problem: 
Theorem  A.  Ut(n)  > (n  — t)  [lg(f  + 1)] . 

Proof.  It  is  most  convenient  to  consider  the  equivalent  problem  of  selecting  the 
smallest  t elements.  We  can  attach  numbers  ( l,u ) to  each  line  of  a comparator 
network,  as  shown  in  Fig.  54,  where  l and  u denote  respectively  the  minimum 
and  maximum  values  that  can  appear  at  that  position  when  the  input  is  a 
permutation  of  {1,2,...,  n}.  Let  f and  lj  be  the  lower  bounds  on  lines  i and  j 
before  a comparison  of  x, : Xj , and  let  Vt  and  /'  be  the  corresponding  lower  bounds 
after  the  comparison.  It  is  obvious  that  l[  = min^,/^);  exercise  24  proves  the 
(nonobvious)  relation 

lj  < h + lj-  (19) 
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Fig.  55.  Another  interpretation  for  the  network  of  Fig.  54. 

Now  let  us  reinterpret  the  network  operations  in  another  way  (see  Fig.  55): 
All  input  lines  are  assumed  to  contain  zero,  and  each  “comparator”  now  places 
the  smaller  of  its  inputs  on  the  upper  line  and  the  larger  plus  one  on  the  lower 
line.  The  resulting  numbers  (toi,TO2,  . . . , mn)  have  the  property  that 
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COMPARISONS  NEEDED  IN  SELECTION  NETWORKS  (Ut(n),  Vt(n),  Wt(n)) 


t = 1 

t = 2 

t = 3 

t = 4 

t = 5 t = 6 

n = 1 

(0,0,0) 

n = 2 

(1,1,1) 

(0,1,1) 

n = 3 

(2,2,2) 

(2,3,3) 

(0,2,3) 

n = 4 

(3,3,3) 

(4,5,5) 

(3,5,5) 

(0,3,5) 

n = 5 

(4,4,4) 

(6,7,7) 

(6,7,8) 

(4,7,9) 

(0,4,9) 

n = 6 

(5,5,5) 

(8,9,9) 

(8,10,10) 

(8,10,12) 

(5,9,12)  (0,5,12) 

throughout  the  network,  since  this  holds  initially  and  it  is  preserved  by  each 
comparator  because  of  (19).  Furthermore,  the  final  value  of 

mi  + m2  + • • • + mn 

is  the  total  number  of  comparators  in  the  network,  since  each  comparator  adds 
unity  to  this  sum. 

If  the  network  selects  the  smallest  t numbers,  n — t of  the  lt  are  > t + 1; 
hence  n — t of  the  m*  must  be  > [ lg(t  + 1)] . | 

The  lower  bound  in  Theorem  A turns  out  to  be  exact  when  t = 1 and  when 
t = 2 (see  exercise  19).  Table  1 gives  some  values  of  Ut(n),  Vt(n),  and  Wt(n)  for 
small  t and  n.  Andrew  Yao  [Ph.D.  thesis,  U.  of  Illinois  (1975)]  determined  the 
asymptotic  behavior  of  (7t(n)  for  fixed  t,  by  showing  that  1/3(71)  = 2n+lg  n+0(l) 
and  Ut(n)  = n[lg(f  + 1)]  + 0((logn)Llg*J)  as  n ->  00;  the  minimum  delay  time 
is  lgn  + |_lgtj  lglgn  + O (log  log  log  n).  N.  Pippenger  [SICOMP  20  (1991),  878- 
887]  has  proved  by  nonconstructive  methods  that  for  any  e > 0 there  exist 
selection  networks  with  U\n/ 2]  (n)  < (2  + e)nlgn,  whenever  n is  sufficiently  large 
(depending  on  e). 

EXERCISES  — First  Set 

Several  of  the  following  exercises  develop  the  theory  of  sorting  networks  in  detail,  and 
it  is  convenient  to  introduce  some  notation.  We  let  [i:j]  stand  for  a comparison/ 
interchange  module.  A network  with  n inputs  and  r comparator  modules  is  written 
[*1  -ji][i2  : jb]  • ■ ■ [A  -jr],  where  each  of  the  V s and  j’s  is  < n;  we  shall  call  it  an  n-network 
for  short.  A network  is  called  standard  if  iq  < jq  for  1 < q < r.  Thus,  for  example, 
Fig.  44  on  page  221  depicts  a standard  4-network,  denoted  by  the  comparator  sequence 
[1 : 2]  [,3 : 4]  [1 : 3]  [2 : 4]  [2:3], 

The  text’s  convention  for  drawing  network  diagrams  represents  only  standard 
networks;  all  comparators  [i'-j]  are  represented  by  a line  from  i to  j,  where  i < j.  When 
nonstandard  networks  must  be  drawn,  we  can  use  an  arrow  from  i to  j,  indicating  that 
the  larger  number  goes  to  the  point  of  the  arrow.  For  example,  Fig.  56  illustrates  a 
nonstandard  network  for  16  elements,  whose  comparators  are  [1 : 2] [4 : 3] [5 : 6][8 : 7] 
Exercise  11  proves  that  Fig.  56  is  a sorting  network. 

If  x = (ii,..,,r„)  is  an  n-vector  and  a is  an  n-network,  we  write  xa  for  the 
vector  of  numbers  j(ra)i, . . . , (xa)n)  produced  by  the  network.  For  brevity,  we  also  let 
aVb  = max(a,6),  aAb  = min(a,f>),  a = 1— a.  Thus  (x[i:j])i  = xtAxj,  (x[i:j])j  = XiVXj, 
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Fig.  56.  A nonstandard  sorting  network  based  on  bitonic  sorting. 


and  (x[i:j])k  = Xk  when  i / k ^ j.  We  say  a is  a sorting  network  if  (xa)i  < (xa)i+i 
for  all  x and  for  1 < i < n. 

The  symbol  eW  stands  for  a vector  that  has  1 in  position  i , 0 elsewhere;  thus 
(ej’*)]  = 5ij.  The  symbol  Dn  stands  for  the  set  of  all  2n  n-place  vectors  of  Os  and  Is, 
and  Pn  stands  for  the  set  of  all  n!  vectors  that  are  permutations  of  {1,2, . . . , n).  We 
write  x Ay  and  iVt/  for  the  vectors  ( X\  A yi, . . . , xn  A yn ) and  (x\  V x„  V y„), 

and  we  write  x C y if  n < yi  for  all  i.  Thus  x C y if  and  only  if  as  V y = y if  and  only  if 
x A y = x.  If  x and  y are  in  Dn,  we  say  that  x covers  y if  x s=  (y  V eW)  ^ y for  some  i. 
Finally  for  all  x in  Dn  we  let  v{x)  be  the  number  of  Is  in  x,  and  ((x)  the  number  of  Os; 
thus  is(x)  + C(-c)  = n. 

1.  [20]  Draw  a network  diagram  for  the  odd-even  merge  when  m = 3 and  n = 5. 

2.  [22]  Show  that  V.  Pratt’s  sorting  algorithm  (exercise  5.2.1-30)  leads  to  a sorting 
network  for  n elements  that  has  approximately  (log2  n)(log3  n)  levels  of  delay.  Draw 
the  corresponding  network  for  n = 12. 

3.  [M20]  (K.  E.  Batcher.)  Find  a simple  relation  between  C(m,m—1)  and  C(m,m). 

► 4.  [M23]  Prove  that  T( 6)  = 5. 

5.  [ M16 ] Prove  that  (13)  is  the  delay  time  associated  with  the  sorting  network 
outlined  in  (10). 

6.  [28]  Let  T(n)  be  the  minimum  number  of  stages  needed  to  sort  n distinct  numbers 
by  making  simultaneous  disjoint  comparisons  (without  necessarily  obeying  the  network 
constraint);  such  comparisons  can  be  represented  as  a node  containing  a set  of  pairs 
{*i : ji,*2 : J2 , • . . , ir'-jr}  where  ii,  ji,  12,  J2,  • ■ ■ 1 ir,jr  are  distinct,  with  2r  branches  below 
this  node  for  the  respective  cases 

(Kn  < Kn,  Kl2<KJ2,...,  Kir  < Kjr), 

(Ktl  > Kh , Ki2  < K]2  ,...,Kir<Kjr), 

Prove  that  T( 5)  = T(6)  = 5. 


etc. 
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7.  [25]  Show  that  if  the  final  three  comparators  of  the  network  for  n = 10  in  Fig.  49 
are  replaced  by  the  “weaker”  sequence  [5 : 6] [4: 5] [6 : 7],  the  network  will  still  sort. 

8.  [M20]  Prove  that  M(mi+m2,  ni+n2)  > M(mi,m ) + M(ra2,rc2)  + min(mi,n2), 
for  mi,m2,ni,n2  > 0. 

9.  [M25]  (R.  W.  Floyd.)  Prove  that  M( 3,3)  = 6,  M(4,4)  = 9,  M{ 5,5)  = 13. 

10.  [M22\  Prove  that  Batcher’s  bitonic  sorter,  as  defined  in  the  remarks  preceding 
(15),  is  valid.  [Hint:  It  is  only  necessary  to  prove  that  all  sequences  consisting  of  k Is 
followed  by  l Os  followed  by  n — k — l Is  will  be  sorted.] 

11.  [M23]  Prove  that  Batcher’s  bitonic  sorter  of  order  2‘  will  not  only  sort  sequences 
{20,  21, . . . , z2t_  1)  for  which  zo  > • • • > Zfc  < • • ■ < z2t_1;  it  also  will  sort  any  sequence 
for  which  zq  < ■ ■ • < zk  > • • • > z2t_x.  [As  a consequence,  the  network  in  Fig.  56  will 
sort  16  elements,  since  each  stage  consists  of  bitonic  sorters  or  reverse-order  bitonic 
sorters,  applied  to  sequences  that  have  been  sorted  in  opposite  directions.] 

12.  [M20]  Prove  or  disprove:  If  x and  y are  bitonic  sequences  of  the  same  length,  so 
are  xV  y and  x A y. 

► 13.  [24]  (H.  S.  Stone.)  Show  that  a sorting  network  for  2*  elements  can  be  constructed 
by  following  the  pattern  illustrated  for  t = 4 in  Fig.  57.  Each  of  the  t2  steps  in  this 
scheme  consists  of  a “perfect  shuffle”  of  the  first  2t~1  elements  with  the  last  2t_1, 
followed  by  simultaneous  operations  performed  on  2f_1  pairs  of  adjacent  elements. 
Each  of  the  latter  operations  is  either  “0”  (no  operation),  “+”  (a  standard  comparator 
module),  or  ” (a  reverse  comparator  module).  The  sorting  proceeds  in  t stages  of 
t steps  each;  during  the  last  stage  all  operations  are  “+”.  During  stage  s,  for  s < t,  we 
do  t — s steps  in  which  all  operations  are  “0”,  followed  by  s steps  in  which  the  operations 
within  step  q consist  alternately  of  2q~1  “+”  followed  by  2q~1  for  q = 1,  2,  . . . , s. 

[Note  that  this  sorting  scheme  could  be  performed  by  a fairly  simple  device  whose 
circuitry  performs  one  “shuffle-and-operate”  step  and  feeds  the  output  lines  back  into 
the  input.  The  first  three  steps  in  Fig.  57  could  of  course  be  eliminated;  they  have 
been  retained  only  to  make  the  pattern  clear.  Stone  notes  that  the  same  pattern 
“shuffle/operate”  occurs  in  several  other  algorithms,  such  as  the  fast  Fourier  transform 
(see  4.6.4-(4o)).] 

► 14.  [M27]  (V.  E.  Alekseev.)  Let  a = [ii : ji] . . . [ir  :jr]  be  an  n-network;  for  1 < s < r 
we  define  as  = [ii  :j[] . . . [ii_x : j's-\][is  : js]  . . . [ir:jr],  where  the  i'k  and  j'k  are  obtained 
from  ik  and  jk  by  changing  is  to  js  and  changing  js  to  is  wherever  they  appear.  For 
example,  if  a = [1 : 2]  [3 : 4]  [1 : 3]  [2 : 4]  [2:3],  then  a4  — [1 : 4]  [3 : 2]  [1 : 3]  [2 : 4]  [2:3]. 

a)  Prove  that  Dna  = Dn(as). 

b)  Prove  that  (a3)*  = (a*)s- 

c)  A conjugate  of  a is  any  network  of  the  form  (. . . ((asi  )S2 ) . . . )Sfc.  Prove  that  a has 
at  most  2r_1  conjugates. 

d)  Let  ga(x)  = [x  € Dna\,  and  let  fa(x)  = (xil  V Xj1)  A •••  A (xir  V xjr).  Prove  that 
ga(x ) = \/{fa'(x)  | a'  is  a conjugate  of  a}. 

e)  Let  Ga  be  the  directed  graph  with  vertices  {1, . . . ,n}  and  with  arcs  is  — > js  for 
1 < s < r.  Prove  that  a is  a sorting  network  if  and  only  if  GQ<  has  an  oriented 
path  from  i to  i + 1 for  1 < i < n and  for  all  a'  conjugate  to  a.  [This  condition  is 
somewhat  remarkable,  since  Ga  does  not  depend  on  the  order  of  the  comparators 
in  a.] 

15.  [20]  Find  a nonstandard  sorting  network  for  four  elements  that  has  only  five 
comparator  modules. 
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16.  [M2 2]  Prove  that  the  following  algorithm  transforms  any  sorting  network  [i,  : 

■ . .[ir  :jr\  into  a standard  sorting  network  of  the  same  length: 

Tl.  Let  q be  the  smallest  index  such  that  iq  > jq.  If  no  such  index  exists,  stop. 

T2.  Change  all  occurrences  of  iq  to  jq,  and  all  occurrences  of  jq  to  iq , in  all 
comparators  [is : js]  for  q < s < r.  Return  to  Tl.  | 

Thus,  [4: 1]  [3: 2]  [1:3]  [2:4]  [1 : 2]  [3: 4]  is  first  transformed  into  [1:4] [3: 2] [4: 3] [2:1] [4: 2] [3:1],. 
then  [ 1 : 4] [2 : 3] [4 : 2] [3 : 1] [4 : 3] [2 : 1] , then  [1 : 4] [2 : 3] [2 : 4] [3 : 1 ] [2 : 3] [4 : 1 ] , etc.,  until  the 
standard  network  [1 : 4] [2 : 3]  [2 : 4]  [1 : 3]  [1 : 2]  [3 : 4]  is  obtained. 

17.  [ M25 ] Let  Dtn  be  the  set  of  all  (")  sequences  (xi,...,xn)  of  Os  and  Is  having 

exactly  t Is.  Show  that  Ut(n)  is  the  minimum  number  of  comparators  needed  in  a 
network  that  sorts  all  the  elements  of  Dtn',  Vt(n)  is  the  minimum  number  needed  to 
sort  Dtn  U and  Wt(n)  is  the  minimum  number  needed  to  sort  |J0<fc<f  Dkn- 

► 18.  [M20]  Prove  that  a network  that  finds  the  median  of  2t  — 1 elements  requires  at 
least  (t-l)[lg(t+l)]  + [lgt]  comparator  modules.  [Hint:  See  the  proof  of  Theorem  A.] 

19.  [M22]  Prove  that  U2(n)  = 2n  - 4 and  V2(n)  = 2n  - 3,  for  all  n > 2. 

20.  [28]  Prove  that  (a)  V3(5)  = 7;  (b)  U4{n)  < 3n  - 10  for  n > 6. 

21.  [21]  True  or  false:  Inserting  a new  standard  comparator  into  any  standard  sorting 
network  yields  another  standard  sorting  network. 

22.  [Ml 7]  Let  a be  any  n-network,  and  let  x and  y be  n-vectors. 

a)  Prove  that  x C y implies  that  xa  C ya. 

b)  Prove  that  x-y  < (xa)-(ya),  where  x-y  denotes  the  dot  product  iij/H Yxnyn. 

23.  [Ml 8]  Let  a be  an  n-network.  Prove  that  there  is  a permutation  p G Pn  such 
that  ( pa)i  = j if  and  only  if  there  are  vectors  x and  y in  Dn  such  that  x covers  y, 
(xa)t  = 1,  ( ya)i  = 0,  and  ((y)  = j. 

► 24.  [M21]  (V.  E.  Alekseev.)  Let  a be  an  n-network,  and  for  1 < k < n let 

Ik  = min{(pa)fc  | p € Pn},  uk  = max{(pa)fc  | p E Pn] 

denote  the  lower  and  upper  bounds  on  the  range  of  values  that  may  appear  in  line  k of 
the  output.  Let  l'k  and  u'k  be  defined  similarly  for  the  network  a'  = a[i:j].  Prove  that 

li  — h A lj,  lj  < h + lj , u'i  > Ui  + Uj  — (n  + 1),  u'j  = Wi  V Uj. 

[Hint:  Given  vectors  a:  and  y in  Dn  with  {xa)i  = ( ya)j  = 0,  ((x)  = and  £ (y)  = lq, 
find  a vector  2 in  Dn  with  {za')j  = 0,  C(z)  < U + lj ■] 

25.  [M30]  Let  lk  and  Uk  be  as  defined  in  exercise  24.  Prove  that  all  integers  between 
Ik  and  Uk  inclusive  are  in  the  set  {(pa)k  \ p in  Pn } . 

26.  [M/24 ] (R.  W.  Floyd.)  Let  a be  an  n-network.  Prove  that  one  can  determine  the 
set  Dn a = {a:Q  | x in  Dn}  from  the  set  Pna  = {pa  p in  P„};  conversely,  Pna  can  be 
determined  from  Dna. 

► 27.  [ M20 ] Let  x and  y be  vectors,  and  let  xa  and  ya  be  sorted.  Prove  that  {xa),  < 
(ya)j  if  and  only  if,  for  every  choice  of  j elements  from  y,  we  can  choose  i elements 
from  x such  that  every  chosen  x element  is  < some  chosen  y element.  Use  this  principle 
to  prove  that  if  we  sort  the  rows  of  any  matrix,  then  sort  the  columns,  the  rows  will 
remain  in  order. 
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► 28.  [M20]  The  following  diagram  illustrates  the  fact  that  we  can  systematically  write 
down  formulas  for  the  contents  of  all  lines  in  a sorting  network  in  terms  of  the  inputs: 


a 

b 

c 

d 


a A b 
a V b 
c A d 
c V d 


(a  A b)  A (c  A d) 
(a  V b)  A (c  V d) 
(a  A b)  V (c  A d) 
(a  V b)  V (c  V d) 


(a  A 6)  A (c  A d) 

((a  V 6)  A (c  V d))  A ((a  A b)  V (c  A d)) 
((a  V b)  A (c  V d))  V ((a  A 6)  V (c  A d)) 
(aVt)V(cV  d) 


Using  the  commutative  laws  xAj/  = yf\x,  x\/y  — jVi,  the  associative  laws  x/\(y/\z)  = 
(x  A 2/)  A 2,  3)V(jVz)  = (x  V y)  V z,  the  distributive  laws  x A (yVz)  = (iA|/)V(iA  2), 
iV(jAz)  = (1  V j)  A (1  V 2),  the  absorption  laws  1 A (1  V t/)  = 1 V (1  A j)  = a:, 
and  the  idempotent  laws  iAi  = iVi=i,  we  can  reduce  the  formulas  at  the  right 
of  this  network  to  (a  A b A c A d),  (a  A 6 A c)  V (a  A b A d)  V (a  A c A d)  V (6  A c A d), 
(a  A b)  V (a  A c)  V (a  A d)  V (b  A c)  V (b  A d)  V (c  A d),  and  a V b V c V d,  respectively. 

Prove  that,  in  general,  the  fth  largest  element  of  {xi, . . . , xn  j is  given  by  the 
“elementary  symmetric  function” 


(Tt(x  1, . . . ,x„)  = \J  {xil  A xi2  A • • • A xit  | 1 < ii  < i2  < ■ • • < it  < n }. 

[There  are  (”)  terms  being  V’d  together.  Thus  the  problem  of  finding  minimum-cost 
sorting  networks  is  equivalent  to  the  problem  of  computing  the  elementary  symmetric 
functions  with  a minimum  of  “and/or”  circuits,  where  at  every  stage  we  are  required 
to  replace  two  quantities  <p  and  ip  by  <p  A ip  and  cp  V ip.] 

29.  [ M20 ] Given  that  xi  < X2  < X3  and  2/1  < 2/2  < J/3  < 2/4  < ys,  and  that  z\  < 22  < 

■ • • < zg  is  the  result  of  merging  the  x’s  with  the  y' s,  find  formulas  for  each  of  the  z’s 
in  terms  of  the  x’s  and  the  y' s,  using  the  operators  A and  V. 

30.  [ HM2J, ] Prove  that  any  formula  involving  A and  V and  the  independent  variables 
{ .rj , . . . . :r„  } can  be  reduced  using  the  identities  in  exercise  28  to  a “canonical”  form 
Ti  V T2  V • • • V Tfc,  where  k > 1,  each  r,  has  the  form  f\{xj  \ j in  Si}  where  Si  is  a 
subset  of  {1, 2, . . . ,n},  and  no  set  Si  is  included  in  Sj  for  i ^ j ■ Prove  also  that  two 
such  canonical  forms  are  equal  for  all  x\, , ,. . , xn  if  and  only  if  they  are  identical  (up  to 
order) . 

31.  [ M24  ] (R-  Dedekind,  1897.)  Let  Sn  be  the  number  of  distinct  canonical  forms  on 
xi, . . . , xn  in  the  sense  of  exercise  30.  Thus  <5i  = 1,  62  = 4,  and  <53  = 18.  What  is  64? 

32.  [M28]  (M.  W.  Green.)  Let  Gi  = {00,01, 11},  and  let  Gt+i  be  the  set  of  all  strings 
dtpipoj  such  that  6,  cp , ip,  w have  length  2f_1  and  8<p,  ipuj , dip,  and  <puj  are  in  Gt.  Let 
a be  the  network  consisting  of  the  first  four  levels  of  the  16-sort.er  shown  in  Fig.  49. 
Show  that  Diea  = G 4,  and  prove  that  it  has  exactly  54  + 2 elements.  (See  exercise  31.) 

► 33.  [M22]  Not  all  5n  of  the  functions  of  {xi, . . . ,xn)  in  exercise  31  can  appear  in 
comparator  networks.  In  fact,  prove  that  the  function  (2:1  A X2)  V (X2  A X3)  V (X3  A X4) 
cannot  appear  as  an  output  of  any  comparator  network  on  (xi, . . . , xn). 

34.  [23]  Is  the  following  a sorting  network? 


n=r 
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35.  [20]  Prove  that  any  standard  sorting  network  must  contain  each  of  the  adjacent 
comparators  [i:»+l],  for  1 < i < n,  at  least  once. 

► 36.  [22]  The  network  of  Fig.  47  involves  only  adjacent  comparisons  [i:i+l];  let  us  call 
such  a network  primitive. 

a)  Prove  that  a primitive  sorting  network  for  n elements  must  have  at  least  (") 
comparators.  [Hint:  Consider  the  inversions  of  a permutation.] 

b)  (R.  W.  Floyd,  1964.)  Let  a be  a primitive  network  for  n elements,  and  let  i be  a 
vector  such  that  {xa)t  > (xa)j  for  some  i < j.  Prove  that  (ya)i  > ( ya)j , where 
y is  the  vector  (n,n— 1, . . . , 1). 

c)  As  a consequence  of  (b),  a primitive  network  is  a sorting  network  if  and  only  if  it 
sorts  the  single  vector  (n,  n— 1, . . . , 1). 

37.  [M2 2]  The  odd-even  transposition  sort  for  n numbers,  n > 3,  is  a network  n levels 
deep  with  |n(n  - 1)  comparators,  arranged  in  a brick-like  pattern  as  shown  in  Fig.  58. 
(When  n is  even,  there  are  two  possibilities.)  Such  a sort  is  especially  easy  to  implement 
in  hardware,  since  only  two  kinds  of  actions  are  performed  alternatively.  Prove  that 
such  a network  is,  in  fact,  a valid  sorting  network.  [Hint:  See  exercise  36.] 


n= 5 n=6  n = 6 

Fig.  58.  The  odd  -even  transposition  sort. 

► 38.  [43]  Let  N = (”).  Find  a one-to-one  correspondence  between  Young  tableaux  of 
shape  (n— l,n— 2, . . . , 1)  and  primitive  sorting  networks  [n  : ii+1]  ...  [in  :iiv+l].  [Con- 
sequently by  Theorem  5.1.4H  there  are  exactly 

N\ 

l"-i  3n-2  5"  — 3 ...  (2n  — 3)1 

such  sorting  networks.]  Hint:  Exercise  36(c)  shows  that  primitive  networks  without 
redundant  comparators  correspond  to  paths  from  1 2 ...  n to  n ...  2 1 in  polyhedra  like 
Fig.  1 in  Section  5.1.1. 

39.  [25]  Suppose  that  a primitive  comparator  network  on  n lines  is  known  to  sort  the 
single  input  1010  ...  10  correctly.  (See  exercise  36;  assume  that  n is  even.)  Show  that 
its  “middle  third,”  consisting  of  all  comparators  that  involve  only  lines  \n/ 3]  through 
[2n/3]  inclusive,  will  sort  all  inputs. 

40.  [HM44]  Comparators  [ii : *i  +1] [*2  T2  + I]  . . . [ir  :ir+ 1]  are  chosen  at  random,  with 
each  value  of  ik  6 {1, 2, . . . , n — 1}  equally  likely;  the  process  stops  when  the  network 
contains  a bubble  sort  configuration  like  that  of  Fig.  47  as  a subnetwork.  Prove  that 
r < 4n2  + 0(n3/2  logn),  except  with  probability  O(n~100°). 

41.  [M47]  Comparators  [ii : Ji] [*2  -j 2 ] . . . [ir  '■ jr ] are  chosen  at  random,  with  each  irre- 
dundant  choice  1 < ik  < jk  < n equally  likely;  the  process  stops  when  a sorting  network 
has  been  obtained.  Estimate  the  expected  value  of  r;  is  it  0(n1+e)  for  all  e > 0? 

► 42.  [25]  (D.  Van  Voorhis.)  Prove  that  S(n)  > S(n  — 1)  + [lgn] . 
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43.  [48]  Find  an  (m,  n)-merging  network  with  fewer  than  C(m,n)  comparators,  or 
prove  that  no  such  network  exists. 

44.  [50]  Find  the  exact  value  of  S(n)  for  some  n > 8. 

45.  [M20]  Prove  that  any  (1,  n)-merging  network  without  multiple  fanout  must  have 
at  least  [lg(n  + 1)]  levels  of  delay. 

► 46.  [30]  (M.  Aigner.)  Show  that  the  minimum  number  of  stages  needed  to  merge  m 
elements  with  n,  using  any  algorithm  that  does  simultaneous  disjoint  comparisons  as  in 
exercise  6,  is  at  least  \ lg(m+n)] ; hence  the  bitonic  merging  network  has  optimum  delay. 

47.  [4 7]  Is  the  function  T(n)  of  exercise  6 strictly  less  than  T(n)  for  some  nl 

► 48.  [26]  We  can  interpret  sorting  networks  in  another  way,  letting  each  line  carry 
a multiset  of  m numbers  instead  of  a single  number;  under  this  interpretation,  the 
operation  [i:j]  replaces  xt  and  Xj,  respectively,  by  xt  f\Xj  and  x%  xj,  the  least  m and 
the  greatest  m of  the  2m  numbers  Xi  l+J  Xj.  (For  example,  the  diagram 


3,5}— -—{1, 
1’  8}  {5, 


■{3,5} 

{ 

-{2,9} 

{2,7} 


{1,3} 
8} 
{2,9} 
{2 


,9}-p{2, 
, 7}  * {7, 


{1,3}- 
{5,8}- 
{2,2}- 
9}- 


{1,2} 

■{5,8}- 

{2,3}- 

-{7,9}- 


{1,2} {1,2}  — 

{5,7}  — {2,3}  — 
{2,3}  — i—  {5,7}  — 
{8,9} {8,9}  — 


illustrates  this  interpretation  when  m = 2;  each  comparator  merges  its  inputs  and 
separates  the  lower  half  from  the  upper  half.) 

If  a and  b are  multisets  of  m numbers  each,  we  say  that  a -C  b if  and  only  if 
a fcb  = a (equivalently,  aty  b = b\  the  largest  element  of  a is  less  than  or  equal  to  the 
smallest  of  b).  Thus  a ^ 6 <C  a b. 

Let  a be  an  n- network,  and  let  x = {x\ , . . . ,xn)  be  a vector  in  which  each  Xi  is  a 
multiset  of  m elements.  Prove  that  if  (ia)i  is  not  <C  (xa)j  in  the  interpretation  above, 
there  is  a vector  y in  Dn  such  that  ( ya)t  = 1 and  ( ya)j  = 0.  [Consequently,  a sorting 
network  for  n elements  becomes  a sorting  network  for  mn  elements  if  we  replace  each 
comparison  by  a merge  network  with  M(m,  m)  modules.  Figure  59  shows  an  8-element 
sorter  constructed  from  a 4-element  sorter  by  using  this  observation.] 
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Fig.  59.  An  8-sorter  constructed  from  a 4-sorter,  by  using  the  merging  interpretation. 


49.  [ M23 ] Show  that,  in  the  notation  of  exercise  48,  (x  y)  z = x (y  z)  and 
(x  V y)  V z = x V (y  V 2);  however  (x  V y)  b z is  n°t  always  equal  to  (a;  ^ z)  V [y  h z)i 
and  (x  y)  V (x  ft  z)  V (y  ^ z)  does  not  always  equal  the  middle  m elements  of  x l±l  y t±J  z. 
Find  a correct  formula,  in  terms  of  x , y,  z and  the  ^ and  V operations,  for  those  middle 
elements. 
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50.  [HM46]  Explore  the  properties  of  the  ^ and  V operations  defined  in  exercise  48. 
Is  it  possible  to  characterize  all  of  the  identities  in  this  algebra  in  some  nice  way.  or 
to  derive  them  all  from  a finite  set  of  identities?  In  this  regard,  identities  such  as 
x fax  fax  = x fax,  or  a:  ^ (x  V (z  A fa  V y)))  = * A {x  V y),  which  hold  only  for  to  < 2. 
are  of  comparatively  little  interest;  consider  only  the  identities  that  are  true  for  all  m. 

► 51.  [M25]  (R.  L.  Graham.)  The  comparator  [i:j]  is  called  redundant  in  the  network 
ai[i:j]a2  if  either  (xax)i  < (xai)j  for  all  vectors  x,  or  (xax)l  > (xai)j  for  all 
vectors  x.  Prove  that  if  a is  a network  with  r irredundant  comparators,  there  are 
at  least  r distinct  ordered  pairs  (i.j)  of  distinct  indices  such  that  (xa)l  < ( xa)j  for  all 
vectors  x.  (Consequently,  a network  with  no  redundant  comparators  contains  at  most 
(2)  modules.) 


£ 


o 

£ 

0) 


Fig.  60.  A family  of  networks  whose  ability  to  sort  is  difficult  to  verify,  illustrated  for 
m = 3 and  n = 5.  (See  exercise  52.) 


► 52.  [32]  (M.  O.  Rabin,  1980.)  Prove  that  it  is  intrinsically  difficult  to  decide  in 
general  whether  a sequence  of  comparators  defines  a sorting  network,  by  considering 
networks  of  the  form  sketched  in  Fig.  60.  It  is  convenient  to  number  the  inputs  x0  to 
xn,  where  N = 2 rrin  + m + 2 n;  the  positive  integers  m and  n are  parameters,  The 
first  comparators  are  [j:j  + 2nk\  for  1 < j < 2 n and  1 < k < m.  Then  we  have 
[2j"l  :2j][0:2j]  for  1 < j < n,  in  parallel  with  a special  subnetwork  that  uses  only 
indices  > 2 n.  Next  we  compare  [0 : 2mn  + 2n+j]  for  1 < j < m.  And  finally  there  is 
a complete  sorting  network  for  (xx, . . . ,xN),  followed  by  [0 : 1] [1 : 2] . . . [N  — t—  I : jV  — t], 
where  f = mn  + n + 1. 

a)  Describe  all  inputs  (&'o,  xx, . . . , xjv)  that  are  not  sorted  by  such  a network,  in  terms 
of  the  behavior  of  the  special  subnetwork. 

b)  Given  a set  of  clauses  such  as  (yx  V y2  V y3)  A (y2  V y3  V y4)  A . . . , explain  how 
to  construct  a special  subnetwork  such  that  Fig.  60  sorts  all  inputs  if  and  only  if 
the  clauses  are  unsatisfiable.  [Hence  the  task  of  deciding  whether  a comparator 
sequence  forms  a sorting  network  is  co-NP-complete,  in  the  sense  of  Section  7.9.] 
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53.  [30]  (Periodic  sorting  networks.)  The  following  two  16-networks  illustrate  general 
recursive  constructions  of  f-level  networks  for  n = 2*  in  the  case  t = 4: 


(a)  (b) 


If  we  number  the  input  lines  from  0 to  2'  - 1,  the  1th  level  in  case  (a)  has  comparators 
[i\j]  where  i mod  2t+1~'  < 2*“'  and  j = i ® (2t+1~'  - 1);  there  are  t2‘~1  comparators 
altogether,  as  in  the  bitonic  merge.  In  case  (b)  the  first-level  comparators  are  [2 j : 2 j + 1] 
for  0 < j < 2<_1,  and  the  Ith-level  comparators  for  2 < l < t are  [2 j + 1:2 j + 2t+1~l] 
for  0 < j < 2t_1  - 2i_ij  there  are  (t  — 1)2*  1 + 1 comparators  altogether,  as  in  the 
odd-even  merge. 

If  the  input  numbers  are  2fc-ordered  in  the  sense  of  Theorem  5.2. 1H,  for  some 
k > 1,  prove  that  both  networks  yield  outputs  that  are  2fc_1 -ordered.  Therefore  we 
can  sort  2 * numbers  by  passing  them  through  either  network  t times.  [When  t is  large, 
these  sorting  networks  use  roughly  twice  as  many  comparisons  as  Algorithm  5.2.2M; 
but  the  total  delay  time  is  the  same  as  in  Fig.  57,  and  the  implementation  is  simpler 
because  the  same  network  is  used  repeatedly.] 

54.  [42]  Study  the  properties  of  sorting  networks  made  from  ?n-sorter  modules  instead 
of  2-sorters.  (For  example,  G.  Shapiro  has  constructed  the  network 


which  sorts  16  elements  using  fourteen  4-sorters.  Is  this  the  best  possible?  Prove  that 
m2  elements  can  be  sorted  with  at  most  16  levels  of  m-sorters,  when  m is  sufficiently 
large.) 

55.  [23]  A permutation  network  is  a sequence  of  modules  [ i\ : ji]  . . . [ir  :jr]  where  each 
module  [i:j]  can  be  set  by  external  controls  to  pass  its  inputs  unchanged  or  to  switch 
x i and  Xj  (irrespective  of  the  values  of  xt  and  Xj),  and  such  that  each  permutation 
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of  the  inputs  is  achievable  on  the  output  lines  by  some  setting  of  the  modules.  Every 
sorting  network  is  clearly  a permutation  network,  but  the  converse  is  not  true:  Find  a 
permutation  network  for  five  elements  that  has  only  eight  modules. 

► 56.  [25]  Suppose  the  bit  vector  x £ Dn  is  not  sorted.  Show  that  there  is  a standard 
n-network  ax  that  fails  to  sort  x,  although  it  sorts  all  other  elements  of  Dn. 

57.  [M35]  The  even-odd  merge  is  similar  to  Batcher’s  odd-even  merge,  except  that 
when  mn  > 2 it  recursively  merges  the  sequence  {xm  mod  2+i>  • • ■ , xm-3,  Xm-i)  with 
{yi  i V3 , * * * , y2  \n/2]  — l)  and  mod  2+11  •••;  ^m—2,  Xm}  with  (f/2,  y4, . . . , r/2[n/2j ) be- 

fore making  a set  of  [m/2]  + \n/ 2]  — 1 comparison-interchanges  analogous  to  (1). 
Show  that  the  even-odd  merge  achieves  the  optimum  delay  time  [lg(m  + n)]  of  bitonic 
merging,  without  making  more  comparisons  than  the  bitonic  method.  In  fact,  prove 
that  the  number  of  comparisons  A(m,  n)  made  by  even-odd  merging  satisfies  C(ra,  n)  < 
A(m,  n ) < | (m  + n)  lgmin(m,  n)  + m + |n. 

EXERCISES  — Second  Set 

The  following  exercises  deal  with  several  different  types  of  optimality  questions  related 
to  sorting.  The  first  few  problems  are  based  on  an  interesting  “multihead”  general- 
ization of  the  bubble  sort,  investigated  by  P.  N.  Armstrong  and  R.  J.  Nelson  as  early 
as  1954.  [See  U.S.  Patents  3029413,  3034102 .]  Let  1 = hi  < h,2  <■■■  < hm  = n be 
an  increasing  sequence  of  integers;  we  shall  call  it  a “head  sequence”  of  length  m and 
span  n,  and  we  shall  use  it  to  define  a special  kind  of  sorting  method.  The  sorting  of 
records  R\  . . . R_\  proceeds  in  several  passes,  and  each  pass  consists  of  N + n — 1 steps. 
On  step  j,  for  j = 1 - n,  2 - n,  . . . , N - 1,  the  records  RJ+h[1],  RJ+h[2], . . ,,Rj+h[m] 
are  examined  and  rearranged  if  necessary  so  that  their  keys  are  in  order.  (We  say 
that  Rj+hji], . . . , Rj+h[m]  are  “under  the  read-write  heads.”  When  j + h[k]  is  < 1 or 
> N,  record  Rj+h^  is  left  out  of  consideration;  in  effect,  the  keys  K0,  K-i,  K- 2,  • • • are 
treated  as  — oo  and  Rjv+ii  Km +2,  ■ • ■ are  treated  as  +oo.  Therefore  step  j is  actually 
trivial  when  j < —h[m  — 1]  or  j > N — h[ 2].) 

For  example,  the  following  table  shows  one  pass  of  a sort  when  m = 3,  N = 9, 
and  hi  = 1,  h2  = 2,  h3  = 4: 
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When 

m = 2,  hi 

= 1,  and  h2  = 

2,  this  multihead  method  reduces  to  the  bubble  sort 

(Algorithm  5.2.2B). 
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58.  [21]  (James  Dugundji.)  Prove  that  if  h[k  + 1]  = h[k]  + 1 for  some  k,  1 < k < m, 
the  multihead  sorter  defined  above  will  eventually  sort  any  input  file  in  a finite  number 
of  passes.  But  if  h[k  + 1]  > h[k]  + 2 for  1 < k < m,  the  input  might  never  become 
sorted. 

► 59.  [30]  (Armstrong  and  Nelson.)  Given  that  h[k  + 1]  < h[k]  + k for  1 < k < m,  and 
N >n  — 1 , prove  that  the  largest  n — 1 elements  always  move  to  their  final  destination 
on  the  first  pass.  [Hint:  Use  the  zero-one  principle;  when  sorting  Os  and  Is,  with  fewer 
than  n Is,  prove  that  it  is  impossible  to  have  all  heads  sensing  a 1 unless  all  Os  lie  to 
the  left  of  the  heads.] 

Prove  that  sorting  will  be  complete  in  at  most  \(N  — 1 )/(n  - 1)]  passes  when  the 
heads  satisfy  the  given  conditions.  Is  there  an  input  file  that  requires  this  many  passes? 

60.  [26]  If  n = TV,  prove  that  the  first  pass  can  be  guaranteed  to  place  the  smallest 
key  into  position  Rx  if  and  only  if  h[k  + 1]  < 2h[k]  for  1 < k < m. 

61.  [34]  (J-  Hopcroft.)  A “perfect  sorter”  for  N elements  is  a multihead  sorter 
with  N = n that  always  finishes  in  one  pass.  Exercise  59  proves  that  the  sequence 
(hi,  h2,  h.3,  h4, . . . , hm ) = (l,  2, 4,  7, . . . , 1 + (™))  gives  a perfect  sorter  for  N = (™)  + 1 
elements,  using  m = (\/8iV  — 7 + 1)/2  heads.  For  example,  the  head  sequence  (1,  2, 4,  7, 
11, 16,  22)  is  a perfect  sorter  for  22  elements. 

Prove  that,  in  fact,  the  head  sequence  (1,2,4,7,11,16,23)  is  a perfect  sorter  for 
23  elements. 

62.  [49]  Study  the  largest  N for  which  m-head  perfect  sorters  exist,  given  m.  Is 
N = 0{m2)l 

63.  [23]  (V.  Pratt.)  When  each  head  hk  is  in  position  2k~1  for  1 < k < m,  how  many 
passes  are  necessary  to  sort  the  sequence  zxz2  . . ,z2m-i  of  Os  and  Is  where  zj  = 0 if 
and  only  if  j is  a power  of  2? 

64.  [24]  ( Uniform  sorting.)  The  tree  of  Fig.  34  in  Section  5.3.1  makes  the  comparison 
2:3  in  both  branches  on  level  1,  and  on  level  2 it  compares  1:3  in  each  branch  unless 
that  comparison  would  be  redundant.  In  general,  we  can  consider  the  class  of  all  sorting 
algorithms  whose  comparisons  are  uniform  in  that  way;  assuming  that  the  M = (^) 
pairs  {(a,  b)  \ 1 < a < b < N}  have  been  arranged  into  a sequence 

(ai,  h),  (a2,  b2), . . . , (am,  &m), 

we  can  successively  make  each  of  the  comparisons  Kai  : Kh  1 , Ka2  : Kb2 , ...  whose 
outcome  is  not  already  known.  Each  of  the  M!  arrangements  of  the  (a,  b)  pairs  defines  a 
uniform  sorting  algorithm.  The  concept  of  uniform  sorting  is  due  to  H.  L.  Beus  [JACM 
17  (1970),  482-495],  whose  work  has  suggested  the  next  few  exercises. 

It  is  convenient  to  define  uniform  sorting  formally  by  means  of  graph  theory.  Let 
G be  the  directed  graph  on  the  vertices  {1,2,...,  iV}  having  no  arcs.  For  i = 12, 
. . . , M we  add  arcs  to  G as  follows: 

Case  1.  G contains  a path  from  Oi  to  bj.  Add  the  arc  ax  — > bx  to  G. 

Case  2.  G contains  a path  from  b,  to  a;.  Add  the  arc  bj  — >■  a,  to  G. 

Case  3.  G contains  no  path  from  at  to  b;  or  b;  to  at.  Compare  Kai  :Kbi\  then  add 

the  arc  a%  ->■  bt  to  G if  Kai  < Kbi , the  arc  b,  -»  ax  if  Kai  > Kbi . 

We  are  concerned  primarily  with  the  number  of  key  comparisons  made  by  a uniform 
sorting  algorithm,  not  with  the  mechanism  by  which  redundant  comparisons  are  ac- 
tually avoided.  Thus  the  graph  G need  not  be  constructed  explicitly;  it  is  used  here 
merely  to  help  define  the  concept  of  uniform  sorting. 
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We  shall  also  consider  restricted  uniform  sorting,  in  which  only  paths  of  length  2 
are  counted  in  cases  1,  2,  and  3 above.  (A  restricted  uniform  sorting  algorithm  may 
make  some  redundant  comparisons,  but  exercise  65  shows  that  the  analysis  is  somewhat 
simpler  in  the  restricted  case.) 

Prove  that  the  restricted  uniform  algorithm  is  the  same  as  the  uniform  algorithm 
when  the  sequence  of  pairs  is  taken  in  lexicographic  order 

(1j  2) ( 1 , 3) ( 1 , 4) . . . (l,iV)(2,3)(2,4) . . . (2,JV) . . . (N-1,N). 

Show  in  fact  that  both  algorithms  are  equivalent  to  quicksort  (Algorithm  5.2.2Q)  when 
the  keys  are  distinct  and  when  quicksort’s  redundant  comparisons  are  removed  as  in 
exercise  5.2.2-24.  (Disregard  the  order  in  which  the  comparisons  are  actually  made  in 
quicksort;  consider  only  which  pairs  of  keys  are  compared.) 

65.  [M38]  Given  a pair  sequence  (<zi,  fci) . . . (aM,  6m)  as  in  exercise  64,  let  c,  be  the 
number  of  pairs  (j,  k)  such  that  j < k < i and  ( at,bi ),  ( a.j,bj ),  (a*,,  bk)  forms  a triangle. 

a)  Prove  that  the  average  number  of  comparisons  made  by  the  restricted  uniform 

sorting  algorithm  is  2/(c<  + 2). 

b)  Use  the  results  of  (a)  and  exercise  64  to  determine  the  average  number  of  irredun- 
dant  comparisons  performed  by  quicksort. 

c)  The  following  pair  sequence  is  inspired  by  (but  not  equivalent  to)  merge  sorting: 

(1,2) (3, 4) (5, 6) . . . (1,3) (1,4) (2, 3) (2, 4) (5, 7) . . . ( 1 , 5) (1, 6) ( 1,  7)  ( 1 , 8) (2, 5) . . . 

Does  the  uniform  method  based  on  this  sequence  do  more  or  fewer  comparisons 
than  quicksort,  on  the  average? 

66.  [M29]  In  the  worst  case,  quicksort  does  (^)  comparisons.  Do  all  restricted 
uniform  sorting  algorithms  (in  the  sense  of  exercise  64)  perform  (^)  comparisons  in 
their  worst  case? 

67.  [Mf8]  (H.  L.  Beus.)  Does  quicksort  have  the  minimum  average  number  of  com- 
parisons, over  all  (restricted)  uniform  sorting  algorithms? 

68.  [25]  The  Ph.D.  thesis  “Electronic  Data  Sorting”  by  Howard  B.  Demuth  (Stanford 
University,  October  1956)  was  perhaps  the  first  publication  to  deal  in  any  detail  with 
questions  of  computational  complexity.  Demuth  considered  several  abstract  models 
for  sorting  devices,  and  established  lower  and  upper  bounds  on  the  mean  and  maxi- 
mum execution  times  achievable  with  each  model.  His  simplest  model,  the  “circular 
nonreversible  memory”  (Fig.  61),  is  the  subject  of  this  exercise. 


Fig.  61.  A device  for  which  the  bubble-sort  strategy  is  optimum. 
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Consider  a machine  that  sorts  R\  R?  . . . Rn  in  a number  of  passes,  where  each 
pass  contains  the  following  IV  + 1 steps: 

Step  1.  Set  R -<—  Ri.  (R  is  an  internal  machine  register.) 

Step  i,  for  1 < f < N.  Either  (i)  set  Ri-i  t—  R,  R t—  Ri,  or  (ii)  set  f?;_i  <—  Ri, 
leaving  R unchanged. 

Step  N + 1.  Set  Rn  <—  R. 

The  problem  is  to  find  a way  to  choose  between  alternatives  (i)  and  (ii)  each  time,  in 
order  to  minimize  the  number  of  passes  required  to  sort. 

Prove  that  the  “bubble  sort”  technique  is  optimum  for  this  model.  In  other  words, 
show  that  the  strategy  that  selects  alternative  (i)  whenever  R < Ri  and  alternative  (ii) 
whenever  R > Ri  will  achieve  the  minimum  number  of  passes. 


They  that  weave  networks  shall  be  confounded. 

— Isaiah  19:9 
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5.4.  EXTERNAL  SORTING 

Now  it  IS  TIME  for  us  to  study  the  interesting  problems  that  arise  when  the 
number  of  records  to  be  sorted  is  larger  than  our  computer  can  hold  in  its 
high-speed  internal  memory.  External  sorting  is  quite  different  from  internal 
sorting,  even  though  the  problem  in  both  cases  is  to  sort  a given  file  into 
nondecreasing  order,  since  efficient  storage  accessing  on  external  files  is  rather 
severely  limited.  The  data  structures  must  be  arranged  so  that  comparatively 
slow  peripheral  memory  devices  (tapes,  disks,  drums,  etc.)  can  quickly  cope  with 
the  requirements  of  the  sorting  algorithm.  Consequently  most  of  the  internal 
sorting  techniques  we  have  studied  (insertion,  exchange,  selection)  are  virtually 
useless  for  external  sorting,  and  it  is  necessary  to  reconsider  the  whole  question. 

Suppose,  for  example,  that  we  are  supposed  to  sort  a file  of  five  million 
records  Ri  R2  . . •fi’soooooo!  and  that  each  record  Ri  is  20  words  long  (although 
the  keys  Ki  are  not  necessarily  this  long).  If  only  one  million  of  these  records 
will  fit  in  the  internal  memory  of  our  computer  at  one  time,  what  shall  we  do? 

One  fairly  obvious  solution  is  to  start  by  sorting  each  of  the  five  subfiles 
Ri  ■ ■ ■ -Kioooooo,  -Rioooooi  • • • ^2000000)  • • • j ^4000001  ■ • ■ -R5000000  independently, 
then  to  merge  the  resulting  subfiles  together.  Fortunately  the  process  of  merging 
uses  only  very  simple  data  structures,  namely  linear  lists  that  are  traversed  in 
a sequential  manner  as  stacks  or  as  queues;  hence  merging  can  be  done  without 
difficulty  on  the  least  expensive  external  memory  devices. 

The  process  just  described  — internal  sorting  followed  by  external  merging  — 
is  very  commonly  used,  and  we  shall  devote  most  of  our  study  of  external  sorting 
to  variations  on  this  theme. 

The  ascending  sequences  of  records  that  are  produced  by  the  initial  internal 
sorting  phase  are  often  called  strings  in  the  published  literature  about  sorting; 
this  terminology  is  fairly  widespread,  but  it  unfortunately  conflicts  with  even 
more  widespread  usage  in  other  branches  of  computer  science,  where  “strings” 
are  arbitrary  sequences  of  symbols.  Our  study  of  permutations  has  already  given 
us  a perfectly  good  name  for  the  sorted  segments  of  a file,  which  are  convention- 
ally called  ascending  runs  or  simply  runs.  Therefore  we  shall  consistently  use 
the  word  “runs”  to  describe  sorted  portions  of  a file.  In  this  way  it  is  possible  to 
distinguish  between  “strings  of  runs”  and  “runs  of  strings”  without  ambiguity. 
(Of  course,  “runs  of  a program”  means  something  else  again;  we  can’t  have 
everything.) 

Let  us  consider  first  the  process  of  external  sorting  when  magnetic  tapes 
are  used  for  auxiliary  storage.  Perhaps  the  simplest  and  most  appealing  way  to 
merge  with  tapes  is  the  balanced  two-way  merge  following  the  central  idea  that 
was  used  in  Algorithms  5.2.4N,  S,  and  L.  We  use  four  “working  tapes”  in  this 
process.  During  the  first  phase,  ascending  runs  produced  by  internal  sorting  are 
placed  alternately  on  Tapes  1 and  2,  until  the  input  is  exhausted.  Then  Tapes  1 
and  2 are  rewound  to  their  beginnings,  and  we  merge  the  runs  from  these  tapes, 
obtaining  new  runs  that  are  twice  as  long  as  the  original  ones;  the  new  runs 
are  written  alternately  on  Tapes  3 and  4 as  they  are  being  formed.  (If  Tape 
1 contains  one  more  run  than  Tape  2,  an  extra  “dummy”  run  of  length  0 is 
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assumed  to  be  present  on  Tape  2.)  Then  all  tapes  are  rewound,  and  the  contents 
of  Tapes  3 and  4 are  merged  into  quadruple-length  runs  recorded  alternately  on 
Tapes  1 and  2.  The  process  continues,  doubling  the  length  of  runs  each  time, 
until  only  one  run  is  left  (namely  the  entire  sorted  file).  If  S runs  were  produced 
during  the  internal  sorting  phase,  and  if  2fc_1  < S < 2fc,  this  balanced  two-way 
merge  procedure  makes  exactly  k — [ Ig  S']  merging  passes  over  all  the  data. 

For  example,  in  the  situation  above  where  5000000  records  are  to  be  sorted 
with  an  internal  memory  capacity  of  1000000,  we  have  5 = 5.  The  initial 
distribution  phase  of  the  sorting  process  places  five  runs  on  tape  as  follows: 

Tape  1 #i  . . . #1000000;  #2000001  • • • #3000000;  #4000001 

Tape  2 #1000001  • • • #2000000;  #3000001  • • • #4000000- 

Tape  3 (empty) 

Tape  4 (empty) 

The  first  pass  of  merging  then  produces  longer  runs  on  Tapes  3 and  4,  as  it  reads 
Tapes  1 and  2,  as  follows: 

Tape  3 #1  • . • #2000000;  #4000001  • • • #5000000-  , , 

(2) 

Tape  4 #2000001  • • • #4000000- 

(A  dummy  run  has  implicitly  been  added  at  the  end  of  Tape  2,  so  that  the  last 
run  #4000001  • ■ • #5000000  on  Tape  1 is  merely  copied  onto  Tape  3.)  After  all  tapes 
are  rewound,  the  next  pass  over  the  data  produces 

Tape  1 #1  - . - #4000000-  , . 

(3) 

Tape  2 #4000001  • • • #5000000- 

(Again  that  run  #4000001  • • • #5000000  was  simply  copied;  but  if  we  had  started 
with  8000000  records,  Tape  2 would  have  contained  #4000001  • • ■ #8000000  at  this 
point.)  Finally,  after  another  spell  of  rewinding,  Ri  . . .#5000000  is  produced  on 
Tape  3,  and  the  sorting  is  complete. 

Balanced  merging  can  easily  be  generalized  to  the  case  of  T tapes,  for  any 
T > 3.  Choose  any  number  P with  1 < P < T,  and  divide  the  T tapes  into  two 
“banks,”  with  P tapes  on  the  left  bank  and  T — P on  the  right.  Distribute  the 
initial  runs  as  evenly  as  possible  onto  the  P tapes  in  the  left  bank;  then  do  a 
P- way  merge  from  the  left  to  the  right,  followed  by  a (T  — P)- way  merge  from 
the  right  to  the  left,  etc.,  until  sorting  is  complete.  The  best  choice  of  P usually 
turns  out  to  be  \T/2\  (see  exercises  3 and  4). 

Balanced  two-way  merging  is  the  special  case  T = 4,  P = 2.  Let  us 
reconsider  the  example  above  using  more  tapes,  taking  T = 6 and  P — 3.  The 
initial  distribution  now  gives  us 

Tape  1 #1  . . . #1000000;  #3000001  ■ • ■ #4000000- 

Tape  2 #1000001  - ■ • #2000000;  #4000001  • • • #5000000-  (4) 

Tape  3 #2000001  • • • #3000000  • 


.#. 


5000000 • 


(l) 


250  SORTING 


5.4 


And  the  first  merging  pass  produces 
Tape  4 Ri . ■ . R3000000  • 

Tape  5 R3000001  • ■ • Rsoooooo-  (5) 

Tape  6 (empty) 

(A  dummy  run  has  been  assumed  on  Tape  3.)  The  second  merging  pass  completes 
the  job,  placing  i?i  . . . R5000000  on  Tape  1.  In  this  special  case  T = 6 is  essentially 
the  same  as  T = 5,  since  the  sixth  tape  is  used  only  when  S >7. 

Three-way  merging  requires  more  computer  processing  than  two-way  merg- 
ing; but  this  is  generally  negligible  compared  to  the  cost  of  reading,  writing, 
and  rewinding  the  tapes.  We  can  get  a fairly  good  estimate  of  the  running  time 
by  considering  only  the  amount  of  tape  motion.  The  example  in  (4)  and  (5) 
required  only  two  passes  over  the  data,  compared  to  three  passes  when  T — 4, 
so  the  merging  takes  only  about  two-thirds  as  long  when  T — 6. 

Balanced  merging  is  quite  simple,  but  if  we  look  more  closely,  we  find 
immediately  that  it  isn’t  the  best  way  to  handle  the  particular  cases  treated 
above.  Instead  of  going  from  (1)  to  (2)  and  rewinding  all  of  the  tapes,  we  should 
have  stopped  the  first  merging  pass  after  Tapes  3 and  4 contained  Rx . . . R2000000 
and  R2000001  • • • R40000001  respectively,  with  Tape  1 poised  ready  to  read  the 
records  R4000001  • ■ • Rsoooooo-  Then  Tapes  2,  3,  4 could  be  rewound  and  we  could 
complete  the  sort  by  doing  a three-way  merge  onto  Tape  2.  The  total  number  of 
records  read  from  tape  during  this  procedure  would  be  only  4000000  + 5000000  = 
9000000,  compared  to  5000000  + 5000000  + 5000000  = 15000000  in  the  balanced 
scheme.  A smart  computer  would  be  able  to  figure  this  out. 

Indeed,  when  we  have  five  runs  and  four  tapes  we  can  do  even  better  by 


distributing  them 

as  follows: 

Tape  1 

Ri  - R1000000;  R3000001  ■ ■ • R4000000  • 

Tape  2 

R1000001  • • 

• R2000000 ; R4000001  ■ • • R5000000 

Tape  3 

R2000001  ■ • 

• R3000000  ■ 

Tape  4 

(empty) 

Then  a three-way  merge  to  Tape  4,  followed  by  a rewind  of  Tapes  3 and  4, 
followed  by  a three-way  merge  to  Tape  3,  would  complete  the  sort  with  only 
3000000  + 5000000  = 8000000  records  read. 

And,  of  course,  if  we  had  six  tapes  we  could  put  the  initial  runs  on  Tapes  1 
through  5 and  complete  the  sort  in  one  pass  by  doing  a five- way  merge  to  Tape  6. 
These  considerations  indicate  that  simple  balanced  merging  isn’t  the  best,  and 
it  is  interesting  to  look  for  improved  merging  patterns. 

Subsequent  portions  of  this  chapter  investigate  external  sorting  more  deeply. 
In  Section  5.4.1,  we  will  consider  the  internal  sorting  phase  that  produces  the 
initial  runs;  of  particular  interest  is  the  technique  of  “replacement  selection,” 
which  takes  advantage  of  the  order  present  in  most  data  to  produce  long  initial 
runs  that  actually  exceed  the  internal  memory  capacity  by  a significant  amount. 
Section  5.4.1  also  discusses  a suitable  data  structure  for  multiway  merging. 
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The  most  important  merging  patterns  are  discussed  in  Sections  5.4.2  through 
5.4.5.  It  is  convenient  to  have  a rather  naive  conception  of  tape  sorting  as  we 
learn  the  characteristics  of  these  patterns,  before  we  come  to  grips  with  the 
harsh  realities  of  real  tape  drives  and  real  data  to  be  sorted.  For  example,  we 
may  blithely  assume  (as  we  did  above)  that  the  original  input  records  appear 
magically  during  the  initial  distribution  phase;  in  fact,  these  input  records  might 
well  occupy  one  of  our  tapes,  and  they  may  even  fill  several  tape  reels  since 
tapes  aren’t  of  infinite  length!  It  is  best  to  ignore  such  mundane  considerations 
until  after  an  academic  understanding  of  the  classical  merging  patterns  has  been 
gained.  Then  Section  5.4.6  brings  the  discussion  down  to  earth  by  discussing 
real-life  constraints  that  strongly  influence  the  choice  of  a pattern.  Section  5.4.6 
compares  the  basic  merging  patterns  of  Sections  5.4.2  through  5.4.5,  using  a 
variety  of  assumptions  that  arise  in  practice. 

Some  other  approaches  to  external  sorting,  not  based  on  merging,  are  dis- 
cussed in  Sections  5.4.7  and  5.4.8.  Finally  Section  5.4.9  completes  our  survey  of 
external  sorting  by  treating  the  important  problem  of  sorting  on  bulk  memories 
such  as  disks  and  drums. 

When  this  book  was  first  written,  magnetic  tapes  were  abundant  and  disk 
drives  were  expensive.  But  disks  became  enormously  better  during  the  1980s, 
and  by  the  late  1990s  they  had  almost  completely  replaced  magnetic  tape  units 
on  most  of  the  world’s  computer  systems.  Therefore  the  once-crucial  topic  of 
patterns  for  tape  merging  has  become  of  limited  relevance  to  current  needs. 

Yet  many  of  the  patterns  are  quite  beautiful,  and  the  associated  algorithms 
reflect  some  of  the  best  research  done  in  computer  science  during  its  early  years; 
the  techniques  are  just  too  nice  to  be  discarded  abruptly  onto  the  rubbish  heap 
of  history.  Indeed,  the  ways  in  which  these  methods  blend  theory  with  practice 
are  especially  instructive.  Therefore  merging  patterns  are  discussed  carefully 
and  completely  below,  in  what  may  be  their  last  grand  appearance  before  they 
accept  a final  curtain  call. 

For  all  we  know  now, 
these  techniques  may  well  become  crucial  once  again. 

— PAVEL  CURTIS  (1997) 

EXERCISES 

1.  [15]  The  text  suggests  internal  sorting  first,  followed  by  external  merging.  Why 
don’t  we  do  away  with  the  internal  sorting  phase,  simply  merging  the  records  into 
longer  and  longer  runs  right  from  the  start? 

2.  [10]  What  will  the  sequence  of  tape  contents  be,  analogous  to  (l)  through  (3), 
when  the  example  records  R\  R2  ■ ■ • P5000000  are  sorted  using  a 3-tape  balanced  method 
with  P = 2?  Compare  this  to  the  4-tape  merge;  how  many  passes  are  made  over  all 
the  data,  after  the  initial  distribution  of  runs? 

3.  [20]  Show  that  the  balanced  (P,  T-P)- way  merge  applied  to  S initial  runs  takes 

2k  passes,  when  Pk(T  - P)fc_1  < S < Pk(T  - P)k;  and  it  takes  2k  + 1 passes,  when 
Pk(T  _ < 5 < Pk+ i(T  - P)k. 

Give  simple  formulas  for  (a)  the  exact  number  of  passes,  as  a function  of  S,  when 
T = 2 P;  and  (b)  the  approximate  number  of  passes,  as  S — t 00,  for  general  P and  T. 

4.  [HM15]  What  value  of  P,  for  1 < P < T,  makes  P(T  - P)  a maximum? 
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5.4.1.  Multiway  Merging  and  Replacement  Selection 

In  Section  5.2.4,  we  studied  internal  sorting  methods  based  on  two-way  merging, 
the  process  of  combining  two  ordered  sequences  into  a single  ordered  sequence. 
It  is  not  difficult  to  extend  this  to  the  notion  of  P- way  merging,  where  P runs 
of  input  are  combined  into  a single  run  of  output. 

Let’s  assume  that  we  have  been  given  P ascending  runs,  that  is,  sequences 
of  records  whose  keys  are  in  nondecreasing  order.  The  obvious  way  to  merge 
them  is  to  look  at  the  first  record  of  each  run  and  to  select  the  record  whose 
key  is  smallest;  this  record  is  transferred  to  the  output  and  removed  from  the 
input,  and  the  process  is  repeated.  At  any  given  time  we  need  to  look  at  only  P 
keys  (one  from  each  input  run)  and  select  the  smallest.  If  two  or  more  keys  are 
smallest,  an  arbitrary  one  is  selected. 

When  P isn’t  too  large,  it  is  convenient  to  make  this  selection  by  simply 
doing  P ~ 1 comparisons  to  find  the  smallest  of  the  current  keys.  But  when 
P is,  say,  8 or  more,  we  can  save  work  by  using  a selection  tree  as  described  in 
Section  5.2.3;  then  only  about  lg  P comparisons  are  needed  each  time,  once  the 
tree  has  been  set  up. 

Consider,  for  example,  the  case  of  four-way  merging,  with  a two-level  selec- 
tion tree: 


Step  1. 


Step  2. 


Step  3. 


Step  9. 


087 


087  154 


087  154  170 


087 

154 

170 

154 

170 

426 


087  154  170  426  503  612  653  908  oo  l 


oo 


087 

503 

oo 

170 

908 

oo 

154 

426 

653 

612 

oo 

503 

oo 

170 

908 

oo 

154 

426 

653 

612 

oo 

503 

oo 

170 

908 

oo 

426 

653 

oo 

612 

oo 

oo 

oo 

oo 

oo 

CO 


CO 


An  additional  key  “oo”  has  been  placed  at  the  end  of  each  run  in  this  example, 
so  that  the  merging  terminates  gracefully.  Since  external  merging  generally 
deals  with  very  long  runs,  the  addition  of  records  with  oo  keys  does  not  add 
substantially  to  the  length  of  the  data  or  to  the  amount  of  work  involved  in 
merging,  and  such  sentinel  records  frequently  serve  as  a useful  way  to  delimit 
the  runs  on  a file. 
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Fig.  62.  A tournament  to  select  the  smallest  key,  using  a complete  binary  tree 
whose  nodes  are  numbered  from  1 to  23. 

Each  step  after  the  first  in  this  process  consists  of  replacing  the  smallest 
element  by  the  succeeding  element  in  its  run,  and  changing  the  corresponding 
path  in  the  selection  tree.  Thus  the  three  positions  of  the  tree  that  contain  087 
in  Step  1 are  changed  in  Step  2;  the  three  positions  containing  154  in  Step  2 are 
changed  in  Step  3;  and  so  on.  The  process  of  replacing  one  key  by  another  in 
the  selection  tree  is  called  replacement  selection. 

We  can  look  at  this  four-way  merge  in  several  ways.  From  one  standpoint  it 
is  equivalent  to  three  two-way  merges  performed  concurrently  as  coroutines;  each 
node  in  the  selection  tree  represents  one  of  the  sequences  involved  in  concurrent 
merging  processes.  The  selection  tree  is  also  essentially  operating  as  a priority 
queue,  with  a smallest-in-first-out  discipline. 

As  in  Section  5.2.3  we  could  implement  the  priority  queue  by  using  a heap 
instead  of  a selection  tree.  (The  heap  would,  of  course,  be  arranged  so  that  the 
smallest  element  appears  at  the  top,  instead  of  the  largest,  reversing  the  order  of 
Eq.  5.2.3-(3).)  Since  a heap  does  not  have  a fixed  size,  we  could  therefore  avoid 
the  use  of  oo  keys;  merging  would  be  complete  when  the  heap  becomes  empty. 
On  the  other  hand,  external  sorting  applications  usually  deal  with  comparatively 
long  records  and  keys,  so  that  the  heap  is  filled  with  pointers  to  keys  instead  of 
the  keys  themselves;  we  shall  see  below  that  selection  trees  can  be  represented  by 
pointers  in  such  a convenient  manner  that  they  are  probably  superior  to  heaps 
in  this  situation. 

A tree  of  losers.  Figure  62  shows  the  complete  binary  tree  with  12  external 
(rectangular)  nodes  and  11  internal  (circular)  nodes.  The  external  nodes  have 
been  filled  with  keys,  and  the  internal  nodes  have  been  filled  with  the  “winners,” 
if  the  tree  is  regarded  as  a tournament  to  select  the  smallest  key.  The  smaller 
numbers  above  each  node  show  the  traditional  way  to  allocate  consecutive  stor- 
age positions  for  complete  binary  trees. 
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Fig.  63.  The  same  tournament  as  Fig.  62,  but  showing  the  losers  instead  of  the 
winners;  the  champion  appears  at  the  very  top. 

When  the  smallest  key,  061,  is  to  be  replaced  by  another  key  in  the  selection 
tree  of  Fig.  62,  we  will  have  to  look  at  the  keys  512,  087,  and  154,  and  no 
other  existing  keys,  in  order  to  determine  the  new  state  of  the  selection  tree. 
Considering  the  tree  as  a tournament,  these  three  keys  are  the  losers  in  the 
matches  played  by  061.  This  suggests  that  the  loser  of  a match  should  actually 
be  stored  in  each  internal  node  of  the  tree,  instead  of  the  winner;  then  the 
information  required  for  updating  the  tree  will  be  readily  available. 

Figure  63  shows  the  same  tree  as  Fig.  62,  but  with  the  losers  represented 
instead  of  the  winners.  An  extra  node  number  0 has  been  appended  at  the  top 
of  the  tree,  to  indicate  the  champion  of  the  tournament.  Each  key  except  the 
champion  is  a loser  exactly  once  (see  Section  5.3.3),  so  each  key  appears  just 
once  in  an  external  node  and  once  in  an  internal  node. 

In  practice,  the  external  nodes  at  the  bottom  of  Fig.  63  will  represent  fairly 
long  records  stored  in  computer  memory,  and  the  internal  nodes  will  represent 
pointers  to  those  records.  Note  that  P- way  merging  calls  for  exactly  P external 
nodes  and  P internal  nodes,  each  in  consecutive  positions  of  memory,  hence 
several  efficient  methods  of  storage  allocation  suggest  themselves.  It  is  not 
difficult  to  see  how  to  use  a loser-oriented  tree  for  replacement  selection;  we 
shall  discuss  the  details  later. 

Initial  runs  by  replacement  selection.  The  technique  of  replacement  se- 
lection can  be  used  also  in  the  first  phase  of  external  sorting,  if  we  essentially 
do  a P- way  merge  of  the  input  data  with  itself!  In  this  case  we  take  P to  be 
quite  large,  so  that  the  internal  memory  is  essentially  filled.  When  a record  is 
output,  it  is  replaced  by  the  next  record  from  the  input.  If  the  new  record  has  a 
smaller  key  than  the  one  just  output,  we  cannot  include  it  in  the  current  run;  but 
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Table  1 

EXAMPLE  OF  FOUR-WAY  REPLACEMENT  SELECTION 
Memory  contents  Output 


503 

087 

512 

061 

061 

503 

087 

512 

908 

087 

503 

170 

512 

908 

170 

503 

897 

512 

908 

503 

(275) 

897 

512 

908 

512 

(275) 

897 

653 

908 

653 

(275) 

897 

(426) 

908 

897 

(275) 

(154) 

(426) 

908 

908 

(275) 

(154) 

(426) 

(509) 

(end  of  run) 

275 

154 

426 

509 

154 

275 

612 

426 

509 

275 

etc. 


otherwise  we  can  enter  it  into  the  selection  tree  in  the  usual  way  and  it  will  form 
part  of  the  run  currently  being  produced.  Thus  the  runs  can  contain  more  than 
P records  each,  even  though  we  never  have  more  than  P in  the  selection  tree  at 
any  time.  Table  1 illustrates  this  process  for  P — 4;  parenthesized  numbers  are 
waiting  for  inclusion  in  the  following  run. 

This  important  method  of  forming  initial  runs  was  first  described  by  Har- 
old H.  Seward  [Master’s  Thesis,  Digital  Computer  Laboratory  Report  R-232 
(Mass.  Inst,  of  Technology,  1954),  29-30],  who  gave  reason  to  believe  that  the 
runs  would  contain  more  than  1.5P  records  when  applied  to  random  data.  A.  I. 
Dumey  had  also  suggested  the  idea  about  1950  in  connection  with  a special  sort- 
ing device  planned  by  Engineering  Research  Associates,  but  he  did  not  publish  it. 
The  name  “replacement  selecting”  was  coined  by  E.  H.  Friend  [JACM  3 (1956), 
154],  who  remarked  that  “the  expected  length  of  the  sequences  produced  eludes 
formulation  but  experiment  suggests  that  2 P is  a reasonable  expectation.” 

A clever  way  to  show  that  2 P is  indeed  the  expected  run  length  was  discov- 
ered by  E.  F.  Moore,  who  compared  the  situation  to  a snowplow  on  a circular 
track  [U.S.  Patent  2983904  (1961),  columns  3-4].  Consider  the  situation  shown 
in  Fig.  64:  Flakes  of  snow  are  falling  uniformly  on  a circular  road,  and  a lone 
snowplow  is  continually  clearing  the  snow.  Once  the  snow  has  been  plowed  off 
the  road,  it  disappears  from  the  system.  Points  on  the  road  may  be  designated  by 
real  numbers  x,  0 < x < 1;  a flake  of  snow  falling  at  position  x represents  an  input 
record  whose  key  is  x,  and  the  snowplow  represents  the  output  of  replacement 
selection.  The  ground  speed  of  the  snowplow  is  inversely  proportional  to  the 
height  of  snow  it  encounters,  and  the  situation  is  perfectly  balanced  so  that  the 
total  amount  of  snow  on  the  road  at  all  times  is  exactly  P.  A new  run  is  formed 
in  the  output  whenever  the  plow  passes  point  0. 

After  this  system  has  been  in  operation  for  awhile,  it  is  intuitively  clear  that 
it  will  approach  a stable  situation  in  which  the  snowplow  runs  at  constant  speed 
(because  of  the  circular  symmetry  of  the  track).  This  means  that  the  snow  is  at 
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Fig.  64.  The  perpetual  plow  on  its  ceaseless  cycle. 

constant  height  when  it  meets  the  plow,  and  the  height  drops  off  linearly  in  front 
of  the  plow  as  shown  in  Fig.  65.  It  follows  that  the  volume  of  snow  removed  in 
one  revolution  (namely  the  run  length)  is  twice  the  amount  present  at  any  one 
time  (namely  P). 


Falling  snow 


Fig.  65.  Cross-section,  showing  the  varying  height  of  snow  in  front  of  the  plow  when 
the  system  is  in  its  steady  state. 

In  many  commercial  applications  the  input  data  is  not  completely  random; 
it  already  has  a certain  amount  of  existing  order.  Therefore  the  runs  produced  by 
replacement  selection  will  tend  to  contain  even  more  than  2 P records.  We  shall 
see  that  the  time  required  for  external  merge  sorting  is  largely  governed  by  the 
number  of  runs  produced  by  the  initial  distribution  phase,  so  that  replacement 
selection  becomes  especially  desirable;  other  types  of  internal  sorting  would  pro- 
duce about  twice  as  many  initial  runs  because  of  the  limitations  on  memory  size. 

Let  us  now  consider  the  process  of  creating  initial  runs  by  replacement 
selection  in  detail.  The  following  algorithm  is  due  to  John  R.  Walters,  James 
Painter,  and  Martin  Zalk,  who  used  it  in  a merge-sort  program  for  the  Philco 
2000  in  1958.  It  incorporates  a rather  nice  way  to  initialize  the  selection  tree 
and  to  distinguish  records  belonging  to  different  runs,  as  well  as  to  flush  out  the 
last  run,  with  comparatively  simple  and  uniform  logic.  (The  proper  handling 
of  the  last  run  produced  by  replacement  selection  turns  out  to  be  a bit  tricky, 
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Fig.  66.  Making  initial  runs  by  replacement  selection. 

and  it  has  tended  to  be  a stumbling  block  for  programmers.)  The  principal  idea 
is  to  consider  each  key  as  a pair  ( S,K ),  where  K is  the  original  key  and  S is 
the  run  number  to  which  this  record  belongs.  When  such  extended  keys  are 
lexicographically  ordered,  with  S as  major  key  and  K as  minor  key,  we  obtain 
the  output  sequence  produced  by  replacement  selection. 

The  algorithm  below  uses  a data  structure  containing  P nodes  to  represent 
the  selection  tree;  the  jth  node  X[j]  is  assumed  to  contain  c words  beginning 
in  L0C(A[j])  = Lq  + cj,  for  0 < j < P,  and  it  represents  both  internal  node 
number  j and  external  node  number  P + j in  Fig.  63.  There  are  several  named 
fields  in  each  node: 

KEY  = the  key  stored  in  this  external  node; 

RECORD  = the  record  stored  in  this  external  node  (including  KEY  as  a subfield); 
LOSER  = pointer  to  the  “loser”  stored  in  this  internal  node; 

RN  = run  number  of  the  record  stored  in  this  external  node; 

PE  = pointer  to  internal  node  above  this  external  node  in  the  tree; 

PI  = pointer  to  internal  node  above  this  internal  node  in  the  tree. 

For  example,  when  P = 12,  internal  node  number  5 and  external  node  number  17 
of  Fig.  63  would  both  be  represented  in  X[5],  by  the  fields  KEY  = 170,  LOSER  = 
Lq  + 9c  (the  address  of  external  node  number  21),  PE  — Lq  + 8c,  PI  = Lq  + 2c. 

The  PE  and  PI  fields  have  constant  values,  so  they  need  not  appear  explicitly 
in  memory;  however,  the  initial  phase  of  external  sorting  sometimes  has  trouble 
keeping  up  with  the  I/O  devices,  and  it  might  be  worthwhile  to  store  these 
redundant  values  with  the  data  instead  of  recomputing  them  each  time. 

Algorithm  R ( Replacement  selection).  This  algorithm  reads  records  sequen- 
tially from  an  input  file  and  writes  them  sequentially  onto  an  output  file,  pro- 
ducing RMAX  runs  whose  length  is  P or  more  (except  for  the  final  run).  There 
are  P > 2 nodes,  A[0], . . . , X\P  — 1],  having  fields  as  described  above. 

Rl.  [Initialize.]  Set  RMAX  4-  0,  RC  «-  0,  LASTKEY  <-  oo,  and  Q «-  L0C(X[0]). 
(Here  RC  is  the  number  of  the  current  run  and  LASTKEY  is  the  key  of  the 
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last  record  output.  The  initial  setting  of  LASTKEY  should  be  larger  than  any 
possible  key;  see  exercise  8.)  For  0 < j < P,  set  the  initial  contents  of  X[j] 
as  follows: 


J 4—  L0C(X[j]);  LOSER(J)  4—  J;  RN(J)  4-  0; 

PE ( J)  t-  L0CCY[[(P  + i)/2j])|  PI  (J)  <-L0C(A'[Lj/2J]). 

(The  settings  of  LOSER(J)  and  RN(J)  are  artificial  ways  to  get  the  tree 
initialized  by  considering  a fictitious  run  number  0 that  is  never  output. 
This  is  tricky;  see  exercise  10.) 

R2.  [End  of  run?]  If  RN(Q)  = RC,  go  on  to  step  R3.  (Otherwise  RN(Q)  = RC  + 1 
and  we  have  just  completed  run  number  RC;  any  special  actions  required  by 
a merging  pattern  for  subsequent  passes  of  the  sort  would  be  done  at  this 
point.)  If  RC  = RMAX,  stop;  otherwise  set  RC  4—  RC  + 1. 

R3.  [Output  top  of  tree.]  (Now  Q points  to  the  “champion,”  and  RN(Q)  = RC.) 
If  RC  ^ 0,  output  RECORD  (Q)  and  set  LASTKEY  4-  KEY (Q)  . 

R4.  [Input  new  record.]  If  the  input  file  is  exhausted,  set  RN(Q)  4-  RMAX  + 1 
and  go  on  to  step  R5.  Otherwise  set  RECORD  (Q)  to  the  next  record  from  the 
input  file.  If  KEY (Q)  < LASTKEY  (so  that  this  new  record  does  not  belong  to 
the  current  run),  set  RMAX  4-  RN(Q)  4—  RC  + 1. 

R5.  [Prepare  to  update.]  (Now  Q points  to  a new  record.)  Set  T 4-  PE(Q). 
(Variable  T is  a pointer  that  will  move  up  the  tree.) 

R6.  [Set  new  loser.]  Set  L 4-  LOSER(T) . If  RN(L)  < RN(Q)  or  if  RN(L)  = RN(Q) 
and  KEY (L)  < KEY (Q) , then  set  LOSER(T)  4-  Q and  Q 4—  L.  (Variable  Q 
keeps  track  of  the  current  winner.) 

R7.  [Move  up.]  If  T = L0C(X[1])  then  go  back  to  R2,  otherwise  set  T 4-  PI(T) 
and  return  to  R.6.  | 

Algorithm  R speaks  of  input  and  output  of  records  one  at  a time,  while  in 
practice  it  is  best  to  read  and  write  relatively  large  blocks  of  records.  Therefore 
some  input  and  output  buffers  are  actually  present  in  memory,  behind  the  scenes, 
effectively  lowering  the  size  of  P.  We  shall  illustrate  this  in  Section  5.4.6. 

*Delayed  reconstitution  of  runs.  A very  interesting  way  to  improve  on 
replacement  selection  has  been  suggested  by  R.  J.  Dinsmore  [CACM  8 (1965), 
48]  using  a concept  that  we  shall  call  degrees  of  freedom.  As  we  have  seen, 
each  block  of  records  on  tape  within  a run  is  in  nondecreasing  order,  so  that  its 
first,  element  is  the  lowest  and  its  last  element  is  the  highest.  In  the  ordinary 
process  of  replacement  selection,  the  lowest  element  of  each  block  within  a run 
is  never  less  than  the  highest  element  of  the  preceding  block  in  that  run;  this  is 
“1  degree  of  freedom.”  Dinsmore  suggests  relaxing  this  condition  to  “to  degrees 
of  freedom,”  where  the  lowest  element  of  each  block  may  be  less  than  the  highest 
element  of  the  preceding  block  so  long  as  it  is  not  less  than  the  highest  elements 
in  m different  preceding  blocks  of  the  same  run.  Records  within  individual  blocks 
are  ordered,  as  before,  but  adjacent  blocks  need  not  be  in  order. 
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For  example,  suppose  that  there  are  just  two  records  per  block;  the  following 
sequence  of  blocks  is  a run  with  three  degrees  of  freedom: 

| 08  50  | 06  90  | 17  27  | 42  67  | 51  89  | (i) 


A subsequent  block  that  is  to  be  part  of  the  same  run  must  begin  with  an 
element  not  less  than  the  third  largest  element  of  {50,  90,  27,  67,  89},  namely  67. 
The  sequence  (i)  would  not  be  a run  if  there  were  only  two  degrees  of  freedom, 
since  17  is  less  than  both  50  and  90. 

A run  with  in  degrees  of  freedom  can  be  “reconstituted”  while  it  is  being 
read  during  the  next  phase  of  sorting,  so  that  for  all  practical  purposes  it  is  a run 
in  the  ordinary  sense.  We  start  by  reading  the  first  m blocks  into  in  buffers,  and 
doing  an  to- way  merge  on  them;  when  one  buffer  is  exhausted,  we  replace  it  with 
the  (to.  + l)st  block,  and  so  on.  In  this  way  we  can  recover  the  run  as  a single 
sequence,  for  the  first  word  of  every  newly  read  block  must  be  greater  than  or 
equal  to  the  last  word  of  the  just-exhausted  block  (lest  it  be  less  than  the  highest 
elements  in  m different  blocks  that  precede  it).  This  method  of  reconstituting 
the  run  is  essentially  like  an  m- way  merge  using  a single  tape  unit  for  all  the 
input  blocks!  The  reconstitution  procedure  acts  as  a coroutine  that  is  called 
upon  to  deliver  one  record  of  the  run  at  a time.  We  could  be  reconstituting 
different  runs  from  different  tape  units  with  different  degrees  of  freedom,  and 
merging  the  resulting  runs,  all  at  the  same  time,  in  essentially  the  same  way  as 
the  four-way  merge  illustrated  at  the  beginning  of  this  section  may  be  thought 
of  as  several  two-way  merges  going  on  at  once. 

This  ingenious  idea  is  difficult  to  analyze  precisely,  but  T.  O.  Espelid  has 
shown  how  to  extend  the  snowplow  analogy  to  obtain  an  approximate  formula 
for  the  behavior  [BIT  16  (1976),  133-142].  According  to  his  approximation, 
which  agrees  well  with  empirical  tests,  the  run  length  will  be  about 


2 P + ( to 


1.5) 


f 2P+(m-2)b\ 
\2P  + (2m  - 3)b ) b ’ 


when  b is  the  block  size  and  m > 2.  Such  an  increase  may  not  be  enough  to 
justify  the  added  complication;  on  the  other  hand,  it  may  be  advantageous  when 
there  is  room  for  a rather  large  number  of  buffers  during  the  second  phase  of 
sorting. 


^Natural  selection.  Another  way  to  increase  the  run  lengths  produced  by 
replacement  selection  has  been  explored  by  W.  D.  Frazer  and  C.  K.  Wong  [CACM 
15  (1972),  910-913].  Their  idea  is  to  proceed  as  in  Algorithm  R,  except  that 
a new  record  is  not  placed  in  the  tree  when  its  key  is  less  than  LASTKEY;  it  is 
output  into  an  external  reservoir  instead,  and  another  new  record  is  read  in.  This 
process  continues  until  the  reservoir  is  filled  with  a certain  number  of  records,  P'\ 
then  the  remainder  of  the  current  run  is  output  from  the  tree,  and  the  reservoir 
items  are  used  as  input  for  the  next  run. 

The  use  of  a reservoir  tends  to  produce  longer  runs  than  replacement  selec- 
tion, because  it  reroutes  the  “dead”  records  that  belong  to  the  next  run  instead 
of  letting  them  clutter  up  the  tree;  but  it  requires  extra  time  for  input  and  output 
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L — x -»| 

- ..... Input  snow 


to  and  from  the  reservoir.  When  P'  > P it  is  possible  that  some  records  will  be 
placed  into  the  reservoir  twice,  but  when  P'  < P this  will  never  happen. 

Frazer  and  Wong  made  extensive  empirical  tests  of  their  method,  noticing 
that  when  P is  reasonably  large  (say  P > 32)  and  P'  = P the  average  run 
length  for  random  data  is  approximately  given  by  eP,  where  e « 2.718  is  the 
base  of  natural  logarithms.  This  phenomenon,  and  the  fact  that  the  method 
is  an  evolutionary  improvement  over  simple  replacement  selection,  naturally  led 
them  to  call  their  method  natural  selection. 

The  “natural”  law  for  run  lengths  can  be  proved  by  considering  the  snowplow 
of  Fig.  64  again,  and  applying  elementary  calculus.  Let  L be  the  length  of  the 
track,  and  let  x(t)  be  the  position  of  the  snowplow  at  time  t,  for  0 < t < T. 
The  reservoir  is  assumed  to  be  full  at  time  T,  when  the  snow  stops  temporarily 
while  the  plow  returns  to  its  starting  position  (clearing  the  P units  of  snow 
remaining  in  its  path).  The  situation  is  the  same  as  before  except  that  the 
“balance  condition”  is  different;  instead  of  P units  of  snow  on  the  road  at  all 
times,  we  have  P units  of  snow  in  front  of  the  plow,  and  the  reservoir  (behind 
the  plow)  gets  up  to  P'  = P units.  The  snowplow  advances  by  dx  during  a 
time  interval  dt  if  h(x,t)dx  records  are  output,  where  h(x,t)  is  the  height  of 
the  snow  at  time  t and  position  x — x(t),  measured  in  suitable  units;  hence 
h(x,t)  = h(x,  0)  + Kt  for  all  x , where  K is  the  rate  of  snowfall.  Since  the 
number  of  records  in  memory  stays  constant,  h(x,t)dx  is  also  the  number  of 
records  that  are  input  ahead  of  the  plow,  namely  K dt(L  - x)  (see  Fig.  67). 
Thus 

dx  K(L  — x) 

dt  h(x,t)  ' 

Fortunately,  it  turns  out  that  h(x,  t ) is  constant,  equal  to  KT,  whenever  x = x(t) 
and  0 < t.  < T,  since  the  snow  falls  steadily  at  position  x{t)  for  T-t  units  of  time 
after  the  plow  passes  that  point,  plus  t units  of  time  before  it  comes  back.  In 
other  words,  the  plow  sees  all  snow  at  the  same  height  on  its  journey,  assuming 
that  a steady  state  has  been  reached  where  each  journey  is  the  same.  Hence 
the  total  amount  of  snow  cleared  (the  run  length)  is  LKT]  and  the  amount  of 
snow  in  memory  is  the  amount  cleared  after  time  T,  namely  A'T(L-x(T)).  The 
solution  to  (2)  such  that  x(0)  = 0 is 

x{t)  = L{  1 - e“t/T); 

hence  P = LKTe -1  = (run  length)/e;  and  this  is  what  we  set  out  to  prove. 


(3) 
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Exercises  21  through  23  show  that  this  analysis  can  be  extended  to  the  case 
of  general  P'\  for  example,  when  P'  ~ 2 P the  average  run  length  turns  out  to 
be  ee(e  — 0)P , where  6 = (e  — yj e2  — 4) / 2,  a result  that  probably  wouldn’t  have 
been  guessed  offhand!  Table  2 shows  the  dependence  of  run  length  on  reservoir 
size;  the  usefulness  of  natural  selection  in  a given  computer  environment  can  be 
estimated  by  referring  to  this  table.  The  table  entries  for  reservoir  size  < P use 
an  improved  technique  that  is  discussed  in  exercise  27. 

The  ideas  of  delayed  run  reconstitution  and  natural  selection  can  be  com- 
bined, as  discussed  by  T.  C.  Ting  and  Y.  W.  Wang  in  Comp.  J.  20  (1977), 
298-301. 


Table  2 

RUN  LENGTHS  BY  NATURAL  SELECTION 


Reservoir  size 

Run  length 

k + e 

Reservoir  size 

Run  length 

k + e 

0.10000P 

2.15780P 

0.32071 

0.00000P 

2.00000 P 

0.00000 

0.50000P 

2.54658P 

0.69952 

0.43428P 

2.50000P 

0.65348 

1.00000P 

2.71828P 

1.00000 

1.30432P 

3.00000P 

1.15881 

2.00000P 

3.53487 P 

1.43867 

1.95014P 

3.50000P 

1.42106 

3.00000P 

4.16220P 

1.74773 

2.72294P 

4.00000P 

1.66862 

4.00000P 

4.69446P 

2.01212 

4.63853P 

5.00000P 

2.16714 

5.00000P 

5.16369P 

2.24938 

21.72222P 

10.00000P 

4.66667 

10.00000P 

7.00877P 

3.17122 

5.29143P 

5.29143P 

2.31329 

The  quantity  k + 6 

is  defined  in 

exercise  22, 

or  (when  k = 0) 

in  exercise  27. 

*Analysis  of  replacement  selection.  Let  us  now  return  to  the  case  of  replace- 
ment selection  without  an  auxiliary  reservoir.  The  snowplow  analogy  gives  us 
a fairly  good  indication  of  the  average  length  of  runs  obtained  by  replacement 
selection  in  the  steady-state  limit,  but  it  is  possible  to  get  much  more  precise 
information  about  Algorithm  R by  applying  the  facts  about  runs  in  permutations 
that  we  have  studied  in  Section  5.1.3.  For  this  purpose  it  is  convenient  to  assume 
that  the  input  file  is  an  arbitrarily  long  sequence  of  independent  random  real 
numbers  between  0 and  1. 

Let 

9p(z\-$%2i  • • • j %k)  ^ ' dp  (l i , ^2 , • • • i^k)  ^2  • • * Zjf 

/l  ,/2  v >0 

be  the  generating  function  for  run  lengths  produced  by  P- way  replacement 
selection  on  such  a file,  where  dP(li,l2,  ■ ■ • ,lk)  is  the  probability  that  the  first 
run  has  length  1 1,  the  second  has  length  l2,  ....  the  fcth  has  length  lk.  The 
following  “independence  theorem”  is  basic,  since  it  reduces  the  analysis  to  the 
case  P = 1: 

Theorem  K.  gP(zi,  z2, . . . , zk)  = gi(zu  z2, . . . , zk)p. 

Proof.  Let  the  input  keys  be  X\,X2,  X:i , . . . . Algorithm  R partitions  them  into 
P subsequences,  according  to  which  external  node  position  they  occupy  in  the 
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tree;  the  subsequence  containing  Xn  is  determined  by  the  values  of  Xi, . . . , Xn_j. 
Each  of  these  subsequences  is  therefore  an  independent  sequence  of  independent 
random  numbers  between  0 and  1.  Furthermore,  the  output  of  replacement 
selection  is  precisely  what  would  be  obtained  by  doing  a P- way  merge  on  these 
subsequences;  an  element  belongs  to  the  jth  run  of  a subsequence  if  and  only  if 
it  belongs  to  the  jth  run  produced  by  replacement  selection  (since  LASTKEY  and 
KEY(Q)  belong  to  the  same  subsequence  in  step  R4). 

In  other  words,  we  might  just  as  well  assume  that  Algorithm  R is  being 
applied  to  P independent  random  input  files,  and  that  step  R4  reads  the  next 
record  from  the  file  corresponding  to  external  node  Q;  in  this  sense,  the  algorithm 
is  equivalent  to  a P- way  merge,  with  “stepdowns”  marking  the  ends  of  the  runs. 

Thus  the  output  has  runs  of  lengths  if  and  only  if  the  sub- 

sequences have  runs  of  respective  lengths  (ln, . . . , llk),  . . . , (lPU  . . . , lPk)y  where 
the  lij  are  some  nonnegative  integers  satisfying  J2i<i<plij  = h for  1 < j < k. 
It  follows  that 

ap(h,  ■ ■ h)  = ai(hi,  ■ ■ ■ ,hk)  ■ ■ ■ o,i(lpi,  ■ ■ • , Ipk), 

111-) Vlpi—ll 

hk~\ Y-lpk—lk 

and  this  is  equivalent  to  the  desired  result.  | 

We  have  discussed  the  average  length  Lk  of  the  fcth  run,  when  P = 1, 
in  Section  5.1.3,  where  the  values  are  tabulated  in  Table  5. 1.3-2.  Theorem  K 
implies  that  the  average  length  of  the  fcth  run  for  general  P is  P times  as  long 
as  the  average  when  P — 1,  namely  LkP\  and  the  variance  is  also  P times  as 
large,  so  the  standard  deviation  of  the  run  length  is  proportional  to  \[P.  These 
results  were  first  derived  by  B.  J.  Gassner  about  1958. 

Thus  the  first  run  produced  by  Algorithm  R will  be  about  (e- 1 )P  ss  1.718P 
records  long,  for  random  data;  the  second  run  will  be  about  (e2  -2 e)P  « 1.952P 
records  long;  the  third,  about  1.996P;  and  subsequent  runs  will  be  very  close 
to  2 P records  long  until  we  get  to  the  last  two  runs  (see  exercise  14).  The 
standard  deviation  of  most  of  these  run  lengths  is  approximately  J(4e  - 10)P  « 
0.934 \fp  [ CACM  6 (1963),  685-688],  Furthermore,  exercise  5.1.3-10  shows  that 
the  total  length  of  the  first  fc  runs  will  be  fairly  close  to  (2k  - |)P,  with  a 
standard  deviation  of  ((|fc+  DP)1^2.  The  generating  functions  gi(z,  z, . , . , z) 
and  <7i(l, . . . , 1,  z)  are  derived  in  exercises  5. 1.3-9  and  11. 

The  analysis  above  has  assumed  that  the  input  file  is  infinitely  long,  but 
the  proof  of  Theorem  K shows  that  the  same  probability  ap(l1} , . . , lk)  would 

be  obtained  in  any  random  input  sequence  containing  at  least  li  4 f Zfc  + P 

elements.  So  the  results  above  are  applicable  for,  say,  files  of  size  N > (2fc  + 1)P, 
in  view  of  the  small  standard  deviation. 

We  will  be  seeing  some  applications  in  which  the  merging  pattern  wants 
some  of  the  runs  to  be  ascending  and  some  to  be  descending.  Since  the  residue 
accumulated  in  memory  at  the  end  of  an  ascending  run  tends  to  contain  numbers 
somewhat  smaller  on  the  average  than  random  data,  a change  in  the  direction 
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of  ordering  decreases  the  average  length  of  the  runs.  Consider,  for  example,  a 
snowplow  that  must  make  a U-turn  every  time  it  reaches  an  end  of  a straight 
road;  it  will  go  very  speedily  over  the  area  just  plowed.  The  run  lengths 
when  directions  are  reversed  vary  between  1.5P  and  2 P for  random  data  (see 
exercise  24). 

EXERCISES 

1.  [10]  What  is  Step  4,  in  the  example  of  four- way  merging  at  the  beginning  of  this 
section? 

2.  [12]  What  changes  would  be  made  to  the  tree  of  Fig.  63  if  the  key  061  were 
replaced  by  612? 

3.  [16]  (E.  F.  Moore.)  What  output  is  produced  by  four-way  replacement  selection 
when  it  is  applied  to  successive  words  of  the  following  sentence: 

fourscore  and  seven  years  ago  our  fathers  brought  forth 
on  this  continent  a new  nation  conceived  in  liberty  and 
dedicated  to  the  proposition  that  all  men  are  created  equal. 

(Use  ordinary  alphabetic  order,  treating  each  word  as  one  key.) 

4.  [16']  Apply  four-way  natural  selection  to  the  sentence  in  exercise  3,  using  a reser- 
voir of  capacity  4. 

5.  [00]  True  or  false:  Replacement  selection  using  a tree  works  only  when  P is  a 
power  of  2 or  the  sum  of  two  powers  of  2. 

6.  [15]  Algorithm  R specifies  that  P must  be  > 2;  what  comparatively  small  changes 
to  the  algorithm  would  make  it  valid  for  all  P > 1? 

7.  [17]  What  does  Algorithm  R do  when  there  is  no  input  at  all? 

8.  [20]  Algorithm  R makes  use  of  an  artificial  key  “oo”  that  must  be  larger  than 

any  possible  key.  Show  that  the  algorithm  might  fail  if  an  actual  key  were  equal  to  oo, 

and  explain  how  to  modify  the  algorithm  in  case  the  implementation  of  a true  oo  is 

inconvenient. 

► 9.  [23]  How  would  you  modify  Algorithm  R so  that  it  causes  certain  specified  runs 
(depending  on  RC)  to  be  output  in  ascending  order,  and  others  in  descending  order? 

10.  [26]  The  initial  setting  of  the  LOSER  pointers  in  step  R1  usually  doesn’t  correspond 
to  any  actual  tournament,  since  external  node  P + j may  not  lie  in  the  subtree  below 
internal  node  j . Explain  why  Algorithm  R works  anyway.  [Hint:  Would  the  algorithm 
work  if  {L0SER(L0C(A[0]) ),..., L0SER(L0C(A'[P  — 1]))}  were  set  to  an  arbitrary  per- 
mutation of  {L0C(X[0]) , . . . , L0C(A'[P  — 1]) } in  step  Rl?] 

11.  [M20]  True  or  false:  The  probability  that  KEY(Q)  < LASTKEY  in  step  R4  is 
approximately  50%,  assuming  random  input. 

12.  [M46]  Carry  out  a detailed  analysis  of  the  number  of  times  each  portion  of 
Algorithm  R is  executed;  for  example,  how  often  does  step  R6  set  LOSER  «—  Q ? 

13.  [13]  Why  is  the  second  run  produced  by  replacement  selection  usually  longer  than 
the  first  run? 

► 14.  [HM25]  Use  the  snowplow  analogy  to  estimate  the  average  length  of  the  last  two 
runs  produced  by  replacement  selection  on  a long  sequence  of  input  data. 
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15.  [20]  True  or  false:  The  final  run  produced  by  replacement  selection  never  contains 
more  than  P records.  Discuss  your  answer. 

16.  [M26]  Find  a “simple”  necessary  and  sufficient  condition  that  a file  R±  R2  . . . Rn 
will  be  completely  sorted  in  one  pass  by  P-way  replacement  selection.  What  is  the 
probability  that  this  happens,  as  a function  of  P and  N,  when  the  input  is  a random 
permutation  of  {1,2,...,  A^}? 

17.  [20]  What  is  output  by  Algorithm  R when  the  input  keys  are  in  decreasing  order, 
Ki  K2  P ■ • ■ > Rjv? 

► 18.  [22]  What  happens  if  Algorithm  R is  applied  again  to  an  output  file  that  was 
produced  by  Algorithm  R? 

19.  [HM22]  Use  the  snowplow  analogy  to  prove  that  the  first  run  produced  by  re- 
placement selection  is  approximately  (e  — 1 ) P records  long. 

20.  [HM24]  Approximately  how  long  is  the  first  run  produced  by  natural  selection, 
when  P = P"! 

► 21.  [HM23]  Determine  the  approximate  length  of  runs  produced  by  natural  selection 
when  P'  < P. 

22.  [HM40]  The  purpose  of  this  exercise  is  to  determine  the  average  run  length 
obtained  in  natural  selection,  when  P'  > P.  Let  k,  = k + 6 be  a real  number  > 1. 
where  k = [kJ  and  8 = k mod  1,  and  consider  the  function  F(n)  = Fk(0),  where  Fk{0) 
is  the  polynomial  defined  by  the  generating  function 

J2Fk(0)zk  =e-ezl(l-ze1~z). 
k>  0 

Thus,  Fo(0)  = 1,  Fi  (8)  = e — 8,  F2(8 ) = e2  — e — ed  + \82 , etc. 

Suppose  that  a snowplow  starts  out  at  time  0 to  simulate  the  process  of  natural 
selection,  and  suppose  that  after  T units  of  time  exactly  P snowflakes  have  fallen  behind 
it.  At  this  point  a second  snowplow  begins  on  the  same  journey,  occupying  the  same 
position  at  time  ( + Tas  the  first  snowplow  did  at  time  t.  Finally,  at  time  kT,  exactly 
P'  snowflakes  have  fallen  behind  the  first  snowplow;  it  instantaneously  plows  the  rest 
of  the  road  and  disappears. 

Using  this  model  to  represent  the  process  of  natural  selection,  show  that  a run 
length  equal  to  ee F(k)P  is  obtained  when 

P'/P  = k + 1 + ee  ( kF(k)  — F[k  ~ j ) 

V j= 0 

23.  [HM35]  The  preceding  exercise  analyzes  natural  selection  when  the  records  from 
the  reservoir  are  always  read  in  the  same  order  as  they  were  written,  first-in-first- 
out.  Find  the  approximate  run  length  that  would  be  obtained  if  the  reservoir  contents 
from  the  preceding  run  were  read  in  completely  random  order,  as  if  the  records  in  the 
reservoir  had  been  thoroughly  shuffled  between  runs. 

24.  [HM39]  The  purpose  of  this  exercise  is  to  analyze  the  effect  caused  by  haphazardly 
changing  the  direction  of  runs  in  replacement  selection. 

a)  Let  gp(zi,z2, . . . , Zk)  be  a generating  function  defined  as  in  Theorem  K,  but  with 
each  of  the  k runs  specified  as  to  whether  it  is  to  be  ascending  or  descending. 
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For  example,  we  might  say  that  all  odd-numbered  runs  are  ascending,  all  even- 
numbered  runs  are  descending.  Show  that  Theorem  K is  valid  for  each  of  the  2fc 
generating  functions  of  this  type. 


b) 


As  a consequence  of  (a),  we  may  assume  that  P = 1.  We  may  also  assume  that  the 
input  is  a uniformly  distributed  sequence  of  independent  random  numbers  between 
0 and  1.  Let 


a(x,y) 


if  x <y; 
if  x > y. 


Given  that  f(x)  dx  is  the  probability  that  a certain  ascending  run  begins  with  x , 
prove  that  ((['  a(x,y)f(x)dx)dy  is  the  probability  that  the  following  run  begins 
with  y.  [Hint:  Consider,  for  each  n > 0,  the  probability  that  x < Xi  < • ■ • < 
Xn  > y,  when  x and  y are  given.] 


c)  Consider  runs  that  change  direction  with  probability  p;  in  other  words,  the  direc- 
tion of  each  run  after  the  first  is  randomly  chosen  to  be  the  same  as  that  of  the 
previous  run,  q = (1  — p)  of  the  time,  but  it  is  to  be  in  the  opposite  direction  p of 
the  time.  (Thus  when  p = 0,  all  runs  have  the  same  direction;  when  p = 1,  the 
runs  alternate  in  direction;  and  when  p = |,  the  runs  are  independently  random.) 
Let 


■/ 


/i(x)  = l,  /n+i(y)  =P  a(x,y)fn(l  -x)dx  + q / a(x,y)fn(x)dx. 


f' 

Jo 


Show  that  the  probability  that  the  nth  run  begins  with  x is  fn(x)dx  when  the 
(n  — l)st  run  is  ascending,  /n(  1 — x)  dx  when  the  (n  — l)st  run  is  descending. 

d)  Find  a solution  / to  the  steady-state  equations 


f(y)=p[  a(x,y)f(l-x)dx  + q[  a(x,y)f(x) 
Jo  Jo 


dx, 


[Hint:  Show  that  f"(x)  is  independent  of  x.] 

e)  Show  that  the  sequence  fn(x ) in  part  (c)  converges  rather  rapidly  to  the  function 
f(x)  in  part  (d). 

f)  Show  that  the  average  length  of  an  ascending  run  starting  with  x is  e1~x. 

g)  Finally,  put  all  these  results  together  to  prove  the  following  theorem:  If  the 
directions  of  consecutive  runs  are  independently  reversed  with  probability  p in 
replacement  selection,  the  average  run  length  approaches  (6/(3  + p))P. 

(The  case  p = 1 of  this  theorem  was  first  derived  by  Knuth  [CACM  6 (1963), 
685-688];  the  case  p = | was  first  proved  by  A.  G.  Konheim  in  1970.) 

25.  [HM40]  Consider  the  following  procedure: 

Nl.  Read  a record  into  a one-word  “reservoir.”  Then  read  another  record,  R,  and 
let  K be  its  key. 

N2.  Output  the  reservoir,  set  LASTKEY  to  its  key,  and  set  the  reservoir  empty. 
N3.  If  K < LASTKEY  then  output  R and  set  LASTKEY  <—  K and  go  to  N5. 

N4.  If  the  reservoir  is  nonempty,  return  to  N2;  otherwise  enter  R into  the  reservoir. 
N5.  Read  in  a new  record,  R,  and  let  K be  its  key.  Go  to  N3.  | 

This  is  essentially  equivalent  to  natural  selection  with  P = 1 and  with  P'  = 1 or  2 
(depending  on  whether  you  choose  to  empty  the  reservoir  at  the  moment  it  fills  or  at 
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the  moment  it  is  about  to  overfill),  except  that  it  produces  descending  runs,  and  it 
never  stops.  The  latter  anomalies  are  convenient  and  harmless  assumptions  for  the 
purposes  of  this  problem. 

Proceeding  as  in  exercise  24,  let  fn(x,y)dydx  be  the  probability  that  x and  y are 
the  respective  values  of  LASTKEY  and  K just  after  the  nth  time  step  N2  is  performed. 
Prove  that  there  is  a function  gn(x)  of  one  variable  such  that  fn(x,y)  = gn(x)  when 
x < y,  and  fn(x,y)  = gn(x)  — e y(gn(x)  — gn{y))  when  x > y.  This  function  gn( x)  is 
defined  by  the  relations  gi(x)  = 1, 


9n+i{x)  J e gn(u)  du  + J dv  (v 1)  J , du  ((ev  — l)gn(u)  + gn(v)) 

+ x f dv  f du((ev  - l)gn(u)  + gn(v)). 

J X j V 

Show  further  that  the  expected  length  of  the  nth  run  is 

[ dx  f dy  (gn(x)(ey  — 1)  -f  gn(y))(2  — |r/2)  + f dx  (1  - x)gn{x)ex . 

Jo  Jo  J0 

[Note.  The  steady-state  solution  to  these  equations  appears  to  be  very  complicated; 
it  has  been  obtained  numerically  by  J.  McKenna,  who  showed  that  the  run  lengths 
approach  a limiting  value  « 2.61307209.  Theorem  K does  not  apply  to  natural  selection, 
so  the  case  P = 1 does  not  carry  over  to  other  P.] 

26.  [M33]  Considering  the  algorithm  in  exercise  25  as  a definition  of  natural  selection 
when  P'  = 1,  find  the  expected  length  of  the  first  run  when  P'  = r,  for  any  r > 0 as 
follows. 

a)  Show  that  the  first  run  has  length  n with  probability 

f n + r ' 
n 


(n  + r) 


j(n  + r + 1)!. 


b)  Define  “associated  Stirling  numbers”  j[^]]  by  the  rules 
Oil  IT  n 

— Om0 1 

LraJJ  LLra 


= 5; 
Prove  that 


(n  + m — 1) 


n — 1 
m 


+ 


n — 1 
m — 1 


n + r 
n 


for  n > 0. 


EC n + r\  ITrl" 

\k  + r)  [ _k\_ 

fc  = U 

c)  Prove  that  the  average  length  of  the  first  run  is  therefore  cre  - r - 1,  where 


cr 


r 

E 


r + k + 1 
(r  + k)\ 


► 27.  [HM30]  (W.  Dobosiewicz.)  When  natural  selection  is  used  with  P'  < P,  we  need 
not  stop  forming  a run  when  the  reservoir  becomes  full;  we  can  store  records  that  do 
not  belong  to  the  current  run  in  the  main  priority  queue,  as  in  replacement  selection, 
until  only  P'  records  of  the  current  run  are  left.  Then  we  can  flush  them  to  the  output 
and  replace  them  with  the  reservoir  contents. 

How  much  better  is  this  method  than  the  simpler  approach  analyzed  in  exercise  21? 
28.  [25]  The  text  considers  only  the  case  that  all  records  to  be  sorted  have  a fixed  size. 
How  can  replacement  selection  be  done  reasonably  well  on  variable-length  records? 
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29.  [22]  Consider  the  2k  nodes  of  a complete  binary  tree  that  has  been  right-threaded, 
illustrated  here  when  k = 3: 


(Compare  with  2.3.1-(io);  the  top  node  is  the  list  head,  and  the  dotted  lines  are  thread 
links.  In  this  exercise  we  are  not  concerned  with  sorting  but  rather  with  the  structure 
of  complete  binary  trees  when  a list-head-like  node  0 has  been  added  above  node  1,  as 
in  the  “tree  of  losers,”  Fig.  63.) 

Show  how  to  assign  the  2n+k  internal  nodes  of  a large  tree  of  losers  onto  these 
2k  host  nodes  so  that  (i)  every  host  node  holds  exactly  2”  nodes  of  the  large  tree; 
(ii)  adjacent  nodes  in  the  large  tree  either  are  assigned  to  the  same  host  node  or  to 
host  nodes  that  are  adjacent  (linked);  and  (iii)  no  two  pairs  of  adjacent  nodes  in  the 
large  tree  are  separated  by  the  same  link  in  the  host  tree.  [Multiple  virtual  processors 
in  a large  binary  tree  network  can  thereby  be  mapped  to  actual  processors  without 
undue  congestion  in  the  communication  links.] 

30.  [M29]  Prove  that  if  n > k > 1,  the  construction  in  the  preceding  exercise  is 
optimum,  in  the  sense  that  any  2fc-node  host  graph  satisfying  (i),  (ii),  and  (iii)  must 
have  at  least  2k  + 2k -1  — 1 edges  (links)  between  nodes. 

*5.4.2.  The  Polyphase  Merge 

Now  that  we  have  seen  how  initial  runs  can  be  built  up,  we  shall  consider  various 
patterns  that  can  be  used  to  distribute  them  onto  tapes  and  to  merge  them 
together  until  only  a single  run  remains. 

Let  us  begin  by  assuming  that  there  are  three  tape  units,  Tl,  T2,  and  T3, 
available;  the  technique  of  “balanced  merging,”  described  near  the  beginning  of 
Section  5.4,  can  be  used  with  P — 2 and  T = 3,  when  it  takes  the  following  form; 

Bl.  Distribute  initial  runs  alternately  on  tapes  Tl  and  T2. 

B2.  Merge  runs  from  Tl  and  T2  onto  T3;  then  stop  if  T3  contains  only  one  run. 
B3.  Copy  the  runs  of  T3  alternately  onto  Tl  and  T2,  then  return  to  B2.  | 

If  the  initial  distribution  pass  produces  5 runs,  the  first  merge  pass  will  produce 
[5/2]  runs  on  T3,  the  second  will  produce  [5/4],  etc.  Thus  if,  say,  17  < 5 < 32, 
we  will  have  1 distribution  pass,  5 merge  passes,  and  4 copy  passes;  in  general, 
if  5 > 1,  the  number  of  passes  over  all  the  data  is  2[lg5]. 

The  copying  passes  in  this  procedure  are  undesirable,  since  they  do  not 
reduce  the  number  of  runs.  Half  of  the  copying  can  be  avoided  if  we  use  a 
two-phase  procedure: 

Al.  Distribute  initial  runs  alternately  on  tapes  Tl  and  T2. 

A2.  Merge  runs  from  Tl  and  T2  onto  T3;  then  stop  if  T3  contains  only  one  run. 
A3.  Copy  half  of  the  runs  from  T3  onto  Tl. 

A4.  Merge  rims  from  Tl  and  T3  onto  T2;  then  stop  if  T2  contains  only  one  run. 
A5.  Copy  half  of  the  runs  from  T2  onto  Tl.  Return  to  A2.  | 
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The  number  of  passes  over  the  data  has  been  reduced  to  § fig  5]  + | , since  steps 
A3  and  A5  do  only  “half  a pass”;  about  25  percent  of  the  time  has  therefore 
been  saved. 

The  copying  can  actually  be  eliminated  entirely,  if  we  start  with  Fn  runs 
on  T1  and  Fn_1  runs  on  T2,  where  Fn  and  Fn_j  are  consecutive  Fibonacci 
numbers.  Consider,  for  example,  the  case  n — 7,  S = Fn  + Fn_!  = 13  + 8 = 21: 


Phase 

1 

2 

3 

4 

5 

6 
7 


Contents  of  T1  Contents  of  T2 


1,1,1,1,1,1,1,1,1.1,1,14 

14444 

5,5,5 

5 

21 


1,1,1,14444 

3, 3, 3, 3, 3 
3,3 

13 


Contents  of  T3  Remarks 


2, 2, 2, 2, 2, 2, 2, 2 

2,2,2 

8,8 

8 


Initial  distribution 
Merge  8 runs  to  T3 
Merge  5 runs  to  T2 
Merge  3 runs  to  T1 
Merge  2 runs  to  T3 
Merge  1 run  to  T2 
Merge  1 run  to  T1 


Here,  for  example,  “2, 2, 2, 2, 2, 2, 2, 2”  denotes  eight  runs  of  relative  length  2,  con- 
sidering each  initial  run  to  be  of  relative  length  1.  Fibonacci  numbers  are 
omnipresent  in  this  chart! 

Only  phases  1 and  7 are  complete  passes  over  the  data;  phase  2 processes 
only  16/21  of  the  initial  runs,  phase  3 only  15/21,  etc.,  and  so  the  total  number 
of  passes  comes  to  (21  + 16  + 15  + 15  + 16  + 13  + 21)/21  = 5|  if  we  assume 
that  the  initial  runs  have  approximately  equal  length.  By  comparison,  the  two- 
phase  procedure  above  would  have  required  8 passes  to  sort  these  21  initial  runs. 
We  shall  see  that  in  general  this  “Fibonacci”  pattern  requires  approximately 
1.04  lg  S'  + 0.99  passes,  making  it  competitive  with  a four- tape  balanced  merge 
although  it  requires  only  three  tapes. 

The  same  idea  can  be  generalized  to  T tapes,  for  any  T > 3,  using  (T  - 1)- 
way  merging.  We  shall  see,  for  example,  that  the  four-tape  case  requires  only 
about  .703  lgS  + 0.96  passes  over  the  data.  The  generalized  pattern  involves 


generalized  Fibonacci 

numbers. 

Consider  the  following  six-tape  example: 

Phase 

T1 

T2 

T3 

T4 

T5 

T6 

Initial  runs  processed 

1 

l31 

^30 

^28 

l24 

l16 

— 

31  + 30  + 28  + 24  + 16  = 

129 

2 

l15 

l14 

l12 

l8 

— 

516 

16  x 5 = 

80 

3 

l7 

l6 

l4 

— 

9s 

58 

8x9  = 

72 

4 

l3 

l2 

— 

174 

94 

54 

4 x 17  = 

68 

5 

l1 

— - 

332 

172 

92 

52 

2 x 33  = 

66 

6 

— 

651 

331 

171 

91 

51 

1 x 65  = 

65 

7 

1291 

— 

— 

— 

— 

— 

1 x 129  = 

129 

Here  l31  stands  for  31  runs  of  relative  length  1,  etc.;  five-way  merges  have 
been  used  throughout.  This  general  pattern  was  developed  by  R.  L.  Gilstad 
[Proc.  Eastern  Joint  Computer  Conf.  18  (1960),  143-148],  who  called  it  the 
polyphase  merge.  The  three-tape  case  had  been  discovered  earlier  by  B.  K.  Betz 
[unpublished  memorandum,  Minneapolis-Honeywell  Regulator  Co.  (1956)]. 

In  order  to  make  polyphase  merging  work  as  in  the  examples  above,  we 
need  to  have  a “perfect  Fibonacci  distribution”  of  runs  on  the  tapes  after  each 
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phase.  By  reading  the  table  above  from  bottom  to  top,  we  can  see  that  the  first 
seven  perfect  Fibonacci  distributions  when  T = 6 are  {1,0, 0,0,0},  {1,1, 1,1,1}, 
{2, 2, 2, 2,1},  {4, 4, 4, 3, 2},  {8, 8, 7, 6, 4},  {16,15,14,12,8},  and  {31,30,28,24,16}. 
The  big  questions  now  facing  us  are 

1.  What  is  the  rule  underlying  these  perfect  Fibonacci  distributions? 

2.  What  do  we  do  if  S does  not  correspond  to  a perfect  Fibonacci  distribution? 

3.  How  should  we  design  the  initial  distribution  pass  so  that  it  produces  the 
desired  configuration  on  the  tapes? 

4.  How  many  “passes”  over  the  data  will  a T-tape  polyphase  merge  require,  as 
a function  of  S (the  number  of  initial  runs)? 

We  shall  discuss  these  four  questions  in  turn,  first  giving  “easy  answers”  and 
then  making  a more  intensive  analysis. 

The  perfect  Fibonacci  distributions  can  be  obtained  by  running  the  pattern 
backwards,  cyclically  rotating  the  tape  contents.  For  example,  when  T = 6 we 
have  the  following  distribution  of  runs: 


Level 

T1 

T2 

T3 

T4 

T5 

Total 

Final  output 
will  be  on 

0 

1 

0 

0 

0 

0 

1 

T1 

1 

1 

1 

1 

1 

1 

5 

T6 

2 

2 

2 

2 

2 

1 

9 

T5 

3 

4 

4 

4 

3 

2 

17 

T4 

4 

8 

8 

7 

6 

4 

33 

T3 

5 

16 

15 

14 

12 

8 

65 

T2 

6 

31 

30 

28 

24 

16 

129 

T1 

7 

61 

59 

55 

47 

31 

253 

T6 

8 

120 

116 

108 

92 

61 

497 

T5 

n 

(In 

bn 

Cn 

dn 

tn 

T (k) 

n + 1 

d n 4“  bn 

(In  Cn 

&n  “1“  dn 

&n  4"  (-n 

dn 

tn  4“  4&7T, 

T (fc  - 1) 

(Tape  T6  will  always  be  empty  after  the  initial  distribution.) 

The  rule  for  going  from  level  n to  level  n + 1 shows  that  the  condition 

0>n  ^ ''>  dn  ^ ^ n (^) 

will  hold  in  every  level.  In  fact,  it  is  easy  to  see  from  (l)  that 

— ^n  — 1 5 

dn  — ^n  — 1 4“  ^n~  1 ^n  — 1 "h  ^n— 2? 

Cn  — ®n— 1 “1“  ^n  — 1 — ^n—l  H“  ^n  — 2 ^n— 3?  (3) 

— ^n— 1 “I”  ^n  — 1 — ^n— 1 ®n— 2 “I-  ^n— 3 ”1“  ^n— 4? 

dn  ~ &n  — 1 4"  ^n— 1 ~ dn  — 1 “t"  ^n  — 2 d-  dn  — 3 “1“  ^n  — 4 4“  dn— 5? 

where  do  = 1 and  where  we  let  an  = 0 for  n = — 1,  —2,  —3,  —4. 
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The  pth- order  Fibonacci  numbers  are  defined  by  the  rules 

F^=FiP\  + F^2  + ---  + F^p,  for  n > p\ 

Fr{p}  — 0,  for  0 < n < p - 2;  F^\  = 1.  U’ 

In  other  words,  we  start  with  p — 1 Os,  then  1,  and  then  each  number  is  the  sum 
of  the  preceding  p values.  When  p = 2,  this  is  the  usual  Fibonacci  sequence, 
and  when  p = 3 it  has  been  called  the  Tribonacci  sequence.  Such  sequences  were 
apparently  first  studied  for  p > 2 by  Narayana  Pandita  in  1356  [see  P.  Singh. 
Historia  Mathematica  12  (1985),  229-244],  then  many  years  later  by  V.  Schlegel 
in  El  Progreso  Matematico  4 (1894),  173-174.  Schlegel  derived  the  generating 
function 

zv~x  Zp~x  - zp 

1-z-z2 zP  ~ 1 - 2z  + ZP+1 ' ^ 


n>0 


The  last  equation  of  (3)  shows  that  the  number  of  runs  on  T1  during  a six-tape 
polyphase  merge  is  a fifth-order  Fibonacci  number:  an  = f£4. 

In  general,  if  we  set  P = T-  1,  the  polyphase  merge  distributions  for  T tapes 
will  correspond  to  Pth  order  Fibonacci  numbers  in  the  same  way.  The  fcth  tape 
gets 


: p(P)  1 1 p(-P) 

rn+P~2  + Pn+P- 3 r t„+K2 

initial  runs  in  the  perfect  nth  level  distribution,  for  1 < k < P,  and  the  total 
number  of  initial  runs  on  all  tapes  is  therefore 


tn  - PFn+p_ 2 + (P  tyF^+p- 3 H F - (6) 

This  settles  the  issue  of  “perfect  Fibonacci  distributions.”  But  what  should 
we  do  if  S is  not  exactly  equal  to  t.n,  for  any  n?  And  how  do  we  get  the  runs 
onto  the  tapes  in  the  first  place? 

When  S isn’t  perfect  (and  so  few  values  are),  we  can  do  just  as  we  did  in 
balanced  P- way  merging,  adding  artificial  “dummy  runs”  so  that  we  can  pretend 
S is  perfect  after  all.  There  are  several  ways  to  add  the  dummy  runs,  and  we 
aren  t.  ready  yet  to  analyze  the  “best”  way  of  doing  this.  We  shall  discuss  first 
a method  of  distribution  and  dummy-run  assignment  that  isn’t  strictly  optimal, 
although  it  has  the  virtue  of  simplicity  and  appears  to  be  better  than  all  other 
equally  simple  methods. 


Algorithm  D ( Polyphase  merge  sorting  with  “ horizontal ” distribution).  This 
algorithm  takes  initial  runs  and  disperses  them  to  tapes,  one  run  at  a time,  until 
the  supply  of  initial  runs  is  exhausted.  Then  it  specifies  how  the  tapes  are  to 
be  merged,  assuming  that  there  are  T = P + 1 > 3 available  tape  units,  using 
P~ way  merging.  Tape  T may  be  used  to  hold  the  input,  since  it  does  not  receive 
any  initial  runs.  The  following  tables  are  maintained: 

A [ jl , 1 < j < T:  The  perfect  Fibonacci  distribution  we  are  striving  for. 

D Ij3 , 1 < 3 S T:  Number  of  dummy  runs  assumed  to  be  present  at  the 
beginning  of  logical  tape  unit  number  j. 
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Fig.  68.  Polyphase  merge  sorting. 

TAPE  [j] , 1 < j < T:  Number  of  the  physical  tape  unit  corresponding  to  logical 
tape  unit  number  j. 

(It  is  convenient  to  deal  with  “logical  tape  unit  numbers”  whose  assignment  to 

physical  tape  units  varies  as  the  algorithm  proceeds.) 

Dl.  [Initialize.]  Set  A [j]  4-  D[j]  4—  1 and  TAPE  [j]  4-  j,  for  1 < j < T.  Set 

A [T]  4—  D [T]  4—  0 and  TAPE[T]  4—  T.  Then  set  l 4—  1,  j 4—  1. 

D2.  [Input  to  tape  j.}  Write  one  run  on  tape  number  j,  and  decrease  D[j]  by  1. 
Then  if  the  input  is  exhausted,  rewind  all  the  tapes  and  go  to  step  D5. 

D3.  [Advance  j]  If  D [j]  < D [j  + 1] , increase  j by  1 and  return  to  D2. 
Otherwise  if  D [j]  = 0,  go  on  to  D4.  Otherwise  set  j 4—  1 and  return  to  D2. 

D4.  [Up  a level.]  Set  l 4—  l + 1,  a 4—  A [1]  , and  then  for  j = 1,  2,  . . . , P (in 
this  order)  set  D[j]  4-  a + A[j  + 1]  - A [j]  and  A [j]  4-  a + A [ j + 1]  . 
(See  (i)  and  note  that  A[P  + 1]  is  always  zero.  At  this  point  we  will  have 

D [1]  > D [2]  > • • • > D [T]  .)  Now  set  j 4—  1 and  return  to  D2. 

D5.  [Merge.]  If  l = 0,  sorting  is  complete  and  the  output  is  on  TAPE[1] . Other- 
wise, merge  runs  from  TAPE  [1]  , . . . , TAPE  [P]  onto  TAPE  [T]  until  TAPE  [P] 
is  empty  and  D [P]  = 0.  The  merging  process  should  operate  as  follows, 
for  each  run  merged:  If  D[j]  > 0 for  all  j , 1 < j < P,  then  increase  D[T] 
by  1 and  decrease  each  D [j]  by  1 for  1 < j < P;  otherwise  merge  one  run 
from  each  TAPE  [j]  such  that  D [ j]  = 0,  and  decrease  D [j]  by  1 for  each 
other  j.  (Thus  the  dummy  runs  are  imagined  to  be  at  the  beginning  of  the 
tape  instead  of  at  the  ending.) 

D6.  [Down  a level.]  Set  l 4-Z-l.  Rewind  TAPE  [P]  and  TAPE  [T]  . (Actually  the 
rewinding  of  TAPE[P]  could  have  been  initiated  during  step  D5,  just  after 
its  last  block  was  input.)  Then  set  (TAPE  [1]  , TAPE  [2]  ,...,  TAPE  [T]  ) 4— 
(TAPE  [T],  TAPE  [1],...,  TAPE  [T  - 1]),  (D  [1]  , D [2]  , . . . , D [T]  ) 4—  (D[T], 
D [1] , . . . , D [T  — 1] ),  and  return  to  step  D5.  | 


272  SORTING 


Fig.  69.  The  order  in  which  runs  34  through  65  are 
distributed  to  tapes,  when  advancing  from  level  4 to 
level  5.  (See  the  table  of  perfect  distributions,  Eq.  (i).) 
Shaded  areas  represent  the  first  33  runs  that  were  dis- 
tributed when  level  4 was  reached.  The  bottom  row 
corresponds  to  the  beginning  of  each  tape. 
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63 
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The  distribution  rule  that  is  stated  so  succinctly  in  step  D3  of  this  algorithm 
is  intended  to  equalize  the  number  of  dummies  on  each  tape  as  well  as  possible. 
Figure  69  illustrates  the  order  of  distribution  when  we  go  from  level  4 (33  runs) 
to  level  5 (65  runs)  in  a six-tape  sort;  if  there  were  only,  say,  53  initial  runs, 
all  runs  numbered  54  and  higher  would  be  treated  as  dummies.  (The  runs  are 
actually  being  written  at  the  end  of  the  tape,  but  it  is  best  to  imagine  them  being 
written  at  the  beginning,  since  the  dummies  are  assumed  to  be  at  the  beginning.) 

We  have  now  discussed  the  first  three  questions  listed  above,  and  it  remains 
to  consider  the  number  of  “passes”  over  the  data.  Comparing  our  six-tape 
example  to  the  table  (i),  we  see  that  the  total  number  of  initial  runs  processed 
when  S — te  was  a5ti  + + a2t4  + axt5  + agte,  excluding  the  initial 

distribution  pass.  Exercise  4 derives  the  generating  functions 


tt(z)  ^ ^ dnZ 
n>  0 

Hz)  = J2 tnzn 

n>  1 


i 

1 — z — z2  — z3  — zA  — z5  ' 

5 z + 4z2  + 3z3  + 2 z4  + z5 
1 — z — z2  — zz  — z4  — z5  ' 


(7) 


It  follows  that,  in  general,  the  number  of  initial  runs  processed  when  S = tn 
is  exactly  the  coefficient  of  zn  in  a(z)t(z),  plus  tn  (for  the  initial  distribution 
pass).  This  makes  it  possible  to  calculate  the  asymptotic  behavior  of  polyphase 
merging,  as  shown  in  exercises  5 through  7,  and  we  obtain  the  following  results: 


Table  1 

APPROXIMATE  BEHAVIOR  OF  POLYPHASE  MERGE  SORTING 


Tapes 

Phases 

Passes 

Pass/phase 

Growth  ratio 

3 

2.078  In  5 + 0.672 

1.504  In  S + 0.992 

72% 

1.6180340 

4 

1.641  In  5 + 0.364 

1.015  In  S + 0.965 

62% 

1.8392868 

5 

1.524  In  S + 0.078 

0.863  In  5 + 0.921 

57% 

1 .9275620 

6 

1.479  In  S — 0.185 

0.795  In  5 + 0.864 

54% 

1.9659482 

7 

1.460  In  5 - 0.424 

0.762  In  S + 0.797 

52% 

1.9835828 

8 

1.451  In  S — 0.642 

0.744  In  5 + 0.723 

51% 

1.9919642 

10 

1.445  In  S - 1.017 

0.728  In  5 + 0.568 

50% 

1.9980295 

20 

1.443  In  S - 2.170 

0.721  In  5 -0.030 

50% 

1 .9999981 
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In  Table  1,  the  “growth  ratio”  is  lim,,-^  tn+i/tn,  the  approximate  factor  by 
which  the  number  of  runs  increases  at  each  level.  “Passes”  denotes  the  average 
number  of  times  each  record  is  processed,  namely  1/S  times  the  total  number 
of  initial  runs  processed  during  the  distribution  and  merge  phases.  The  stated 
number  of  passes  and  phases  is  correct  in  each  case  up  to  0(S~C),  for  some  e > 0, 
for  perfect  distributions  as  S — > oo. 

Figure  70  shows  the  average  number  of  times  each  record  is  merged,  as 
a function  of  S,  when  Algorithm  D is  used  to  handle  the  case  of  nonperfect 
numbers.  Note  that  with  three  tapes  there  are  “peaks”  of  relative  inefficiency 
occurring  just  after  the  perfect  distributions,  but  this  phenomenon  largely  dis- 
appears when  there  are  four  or  more  tapes.  The  use  of  eight  or  more  tapes  gives 
comparatively  little  improvement  over  six  or  seven  tapes. 


1 2 5 10  20  50  100  200  500  1000  2000  5000 


Initial  runs,  S 


Fig.  70.  Efficiency  of  polyphase  merge  using  Algorithm  D. 


A closer  look.  In  a balanced  merge  requiring  k passes,  every  record  is  processed 
exactly  k times  during  the  course  of  the  sort.  But  the  polyphase  procedure  does 
not  have  this  lack  of  bias;  some  records  may  get  processed  many  more  times 
than  others,  and  we  can  gain  speed  if  we  arrange  to  put  dummy  runs  into  the 
oft-processed  positions. 
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Let  us  therefore  study  the  polyphase  distribution  more  closely;  instead  of 
merely  looking  at  the  number  of  runs  on  each  tape,  as  in  (1),  let  us  associate 
with  each  run  its  merge  number , the  number  of  times  it  will  be  processed  during 
the  complete  polyphase  sort.  We  get  the  following  table  in  place  of  (1): 


Level 

Tl 

T2 

T3 

T4 

T5 

0 

0 

— 



1 

1 

1 

1 

1 

1 

2 

21 

21 

21 

21 

2 

3 

3221 

3221 

3221 

322 

32 

4 

43323221 

43323221 

4332322 

433232 

4332 

5 

5443433243323221 

544343324332322 

54434332433232 

544343324332 

54434332 

n 

An 

Bn 

Cn 

Dn 

En 

n + 1 

(An  + 1 )Bn 

(An  -f  1 )Cn 

(An  + 1 )Dn 

(An  + 1 )En 

An  + 1 

Here  An  is  a string  of  an  values  representing  the  merge  numbers  for  each  run 
on  Tl,  if  we  begin  with  the  level  n distribution;  Bn  is  the  corresponding  string 
for  T2;  etc.  The  notation  “( An  + 1 )Bnv  means  “An  with  all  values  increased 
by  1,  followed  by  f?„.” 

Figure  71(a)  shows  A5,  B5 , C5,  D5,  E5  tipped  on  end,  showing  how  the 
merge  numbers  for  each  run  appear  on  tape;  notice,  for  example,  that  the  run  at 
the  beginning  of  each  tape  will  be  processed  five  times,  while  the  run  at  the  end 
of  Tl  will  be  processed  only  once.  This  discriminatory  practice  of  the  polyphase 
merge  makes  it  much  better  to  put  a dummy  run  at  the  beginning  of  the  tape 
than  at  the  end.  Figure  71(b)  shows  an  optimum  order  in  which  to  distribute  runs 
for  a five-level  polyphase  merge,  placing  each  new  run  into  a position  with  the 
smallest  available  merge  number.  Algorithm  D is  not  quite  as  good  (see  Fig.  69), 
since  it  fills  some  “4”  positions  before  all  of  the  “3”  positions  are  used  up. 


(a) 


Beginning  of  tape 


Fig.  71.  Analysis  of  the  fifth-level  polyp 
numbers,  (b)  optimum  distribution  order. 
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The  recurrence  relations  (8)  show  that  each  of  Bn,  Cn,  D„,  and  En  are 
initial  substrings  of  An.  In  fact,  we  can  use  (8)  to  derive  the  formulas 

En  = (An~  i)  + 1, 

En  (An—iAn—2)  T 1, 

Cn  = (An-iAn-2An-3)  + 1,  (g) 

Bn  — (An_1An-2An-3An_4)  + 1, 

An  = (An-iAn_2An-3An-4An~5)  + 1, 

generalizing  Eqs.  (3),  which  treated  only  the  lengths  of  these  strings.  Further- 
more, the  rule  defining  the  A’s  implies  that  essentially  the  same  structure  is 
present  at  the  beginning  of  every  level;  we  have 


An  — 71  Qni  (lo) 

where  Qn  is  a string  of  an  values  defined  by  the  law 

Qn  = Qn  — l (Qn  — 2 + l)(<?n-3  + 2)(Qn-4  + 3)(Qn-5  + 4),  for  71  > 1; 

Q o = 0;  Qn  — e (the  empty  string)  for  n < 0.  (11) 

Since  Qn  begins  with  Qn- 1,  we  can  consider  the  infinite  string  Qoo,  whose  first 
an  elements  are  equal  to  Qn\  this  string  Qx  essentially  characterizes  all  the 
merge  numbers  in  polyphase  distribution.  In  the  six-tape  case, 

Qoo  = 011212231223233412232334233434412232334233434452334344534454512232-  • • . 

(12) 

Exercise  11  contains  an  interesting  interpretation  of  this  string. 

Given  that  An  is  the  string  m\m2  . . . mari,  let 

An(r)=rrai+i™2  + -"  + im“" 

be  the  corresponding  generating  function  that  counts  the  number  of  times  each 
merge  number  appears;  and  define  Bn(x),  Cn(x)7  Dn(x),  En(x)  similarly.  For 
example,  A4(x)  - x4  + x3  + x3  + x2  + x3  + x2  + x2  + x = x4  + 3x3  + 3x2  + x. 
Relations  (9)  tell  us  that 

En(x)  = x(An-x(x)) , 

Bn(x)  x(An— i(x)  -f-  An— 2(3?)), 

Cn(x)  — x(An-i(x)  + An_2{x)  + An^3(x)),  (13) 

Bn(x)  = x(An-\(x)  + An_2(x)  + An-3(x)  + A„_4(  x)), 

An(x)  = x(An- i(x)  + An-2(x)  + An_3(x)  + A„_4(  x)  + An_5(x)), 

for  n > 1,  where  A0(x)  = 1 and  An(x)  = 0 for  n = -1,  -2,  -3,  -4.  Hence 

Y,An(x)zn  = 

n>  0 


1 — x(z  + z2  + z3  + z4  + z5) 


Y,xk(z  + z2+z3  + z4  + z5)k. 

k> 0 (14) 
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Considering  the  runs  on  all  tapes,  we  let 

Tn(x)  = An(x)  + Bn(x)  + Cn(x)  + Dn(x)  + En(x),  n>  1;  (15) 

from  (13)  we  immediately  have 


Tn{x) 

hence 


5An-i(x)  + 4An_2(x)  + 3j4n_3(a;)  + 2An-4{x)  + An_5(x), 

'ST  T ( \ n — X^Z  + 4z2  + 3z3  + 2-g4  + z 5) 

“ " 1 - x(z  + z2  + z3  + z4  + z5)  ' 

n.  > 1 v 7 


(16) 


The  form  of  (16)  shows  that  it  is  easy  to  compute  the  coefficients  of  Tn(x): 


z 

22 

z3 

z4 

z5 

z6 

z7 

z8 

z9 

z10 

1"H 

1— 1 

z12 

z13 

z14 

X 

5 

4 

3 

2 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

x 2 

0 

5 

9 

12 

14 

15 

10 

6 

3 

1 

0 

0 

0 

0 

X3 

0 

0 

5 

14 

26 

40 

55 

60 

57 

48 

35 

20 

10 

4 

X4 

0 

0 

0 

5 

19 

45 

85 

140 

195 

238 

260 

255 

220 

170 

x5 

0 

0 

0 

0 

5 

24 

69 

154 

294 

484 

703 

918 

1088 

1168 

The  columns  of  this  tableau  give  Tn(x);  for  example,  T4(x)  = 2x  + 1 2x2  + 
14x3  + 5x4 . After  the  first  row,  each  entry  in  the  tableau  is  the  sum  of  the  five 
entries  just  above  and  to  the  left  in  the  previous  row. 

The  number  of  runs  in  a “perfect”  nth  level  distribution  is  Tn(l),  and  the 
total  amount  of  processing  as  these  runs  are  merged  is  the  derivative,  7^(1). 
Now 


n>l 


5z  + 4z2  + 3z3  + 2z4  + z5 
(l  - x(z  + z2  + z3  + z4  + z5))2  ’ 


(18) 


setting  x — 1 in  (16)  and  (18)  gives  a result  in  agreement  with  our  earlier 
demonstration  that  the  merge  processing  for  a perfect  nth  level  distribution  is 
the  coefficient  of  zn  in  a(z)t(z);  see  (7). 

We  can  use  the  functions  Trl(x)  to  determine  the  work  involved  when  dummy 
runs  are  added  in  an  optimum  way.  Let  En(m)  be  the  sum  of  the  smallest  m 
merge  numbers  in  an  nth  level  distribution.  These  values  are  readily  calculated 
by  looking  at  the  columns  of  (17),  and  we  find  that  E„(m)  is  given  by 

m = l 2 3 4 5 6 7 8 9 10  11  12  13  14  15  16  17  18  19  20  21 


n=l  1234  5oooooooooooooooooooooooooooooooo 
n = 2 1234  6 8 10  12  14  oooooooooooooooooooooooo 

n = 3 1 2 3 5 7 9 11  13  15  17  19  21  24  27  30  33  36  00  00  00  00 

n = 4 1 2 4 6 8 10  12  14  16  18  20  22  24  26  29  32  35  38  41  44  47  (19) 

n = 5 1 3 5 7 9 11  13  15  17  19  21  23  25  27  29  32  35  38  41  44  47 

n = 6 2 4 6 8 10  12  14  16  18  20  22  24  26  28  30  33  36  39  42  45  48 

n = 1 2 4 6 8 10  12  14  16  18  20  23  26  29  32  35  38  41  44  47  50  53 

For  example,  if  we  wish  to  sort  17  runs  using  a level-3  distribution,  the  total 
amount  of  processing  is  E3(17)  = 36;  but  if  we  use  a level-4  or  level-5  distribution 


5.4.2 


THE  POLYPHASE  MERGE  277 


Table  2 

NUMBER  OF  RUNS  FOR  WHICH  A GIVEN  LEVEL  IS  OPTIMUM 


Level 

T = 3 

T = 4 

T = 5 

T = 6 

T = 7 

T = 8 

T = 9 

T = 10 

1 

2 

2 

2 

2 

2 

2 

2 

2 

Mi 

2 

3 

4 

5 

6 

7 

8 

9 

10 

m2 

3 

4 

6 

8 

10 

12 

14 

16 

18 

m3 

4 

6 

10 

14 

14 

17 

20 

23 

26 

m4 

5 

9 

18 

23 

29 

20 

24 

28 

32 

m5 

6 

14 

32 

35 

43 

53 

27 

32 

37 

m6 

7 

22 

55 

76 

61 

73 

88 

35 

41 

m7 

8 

35 

96 

109 

154 

98 

115 

136 

44 

Ms 

9 

56 

173 

244 

216 

283 

148 

171 

199 

m9 

10 

90 

280 

359 

269 

386 

168 

213 

243 

Mio 

11 

145 

535 

456 

779 

481 

640 

240 

295 

Mn 

12 

234 

820 

1197 

1034 

555 

792 

1002 

330 

M12 

13 

378 

1635 

1563 

1249 

1996 

922 

1228 

1499 

M13 

14 

611 

2401 

4034 

3910 

2486 

1017 

1432 

1818 

Mn 

15 

988 

4959 

5379 

4970 

2901 

4397 

1598 

2116 

m15 

16 

1598 

7029 

6456 

5841 

10578 

5251 

1713 

2374 

Mi6 

17 

2574 

14953 

18561 

19409 

13097 

5979 

8683 

2576 

M17 

18 

3955 

20583 

22876 

23918 

15336 

6499 

10069 

2709 

Mia 

19 

6528 

44899 

64189 

27557 

17029 

30164 

11259 

15787 

Mig 

and  position  the  dummy  runs  optimally,  the  total  amount  of  processing  during 
the  merge  phases  is  only  E4(17)  = E5(17)  = 35.  It  is  better  to  use  level  4,  even 
though  17  corresponds  to  a “perfect”  level-3  distribution!  Indeed,  as  S gets  large 
it  turns  out  that  the  optimum  number  of  levels  is  many  more  than  that  used  in 
Algorithm  D. 

Exercise  14  proves  that  there  is  a nondecreasing  sequence  of  numbers  Mn 
such  that  level  n is  optimum  for  Mn  < S < Mn+1,  but  not  for  S > Mn+1.  In 
the  six-tape  case  the  table  of  En(m)  we  have  just  calculated  shows  that 

M0  = 0,  Mi  = 2,  M2  = 6,  M3  = 10,  M4  = 14. 

The  discussion  above  treats  only  the  case  of  six  tapes,  but  it  is  clear  that  the 
same  ideas  apply  to  polyphase  merging  with  T tapes  for  any  T > 3;  we  simply 
replace  5 by  P = T - 1 in  all  appropriate  places.  Table  2 shows  the  sequences 
M„  obtained  for  various  values  of  T.  Table  3 and  Fig.  72  indicate  the  total 
number  of  initial  runs  that  are  processed  after  making  an  optimum  distribution 
of  dummy  runs.  (The  formulas  that  appear  at  the  bottom  of  Table  3 should 
be  taken  with  a grain  of  salt,  since  they  are  least-squares  fits  over  the  range 
1 $ S < 5000,  or  1 < 5 < 10000  for  T — 3;  this  leads  to  somewhat  erratic 
behavior  because  the  given  range  of  S values  is  not  equally  favorable  for  all  T. 
As  S > oc,  the  number  of  initial  runs  processed  after  an  optimum  polyphase 
distribution  is  asymptotically  SlogP  5,  but  convergence  to  this  asymptotic  limit 
is  extremely  slow.) 
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Fig.  72.  Efficiency  of  polyphase  merge  with  optimum  initial  distribution,  using  the 
same  assumptions  as  Fig.  70. 


Table  3 

INITIAL  RUNS  PROCESSED  DURING  AN  OPTIMUM  POLYPHASE  MERGE 


s 

T = 3 

T = 4 

T = 5 

T = 6 

II 

T = 8 

T = 9 

T = 10 

10 

36 

24 

19 

17 

15 

14 

13 

12 

20 

90 

60 

49 

44 

38 

36 

34 

33 

50 

294 

194 

158 

135 

128 

121 

113 

104 

100 

702 

454 

362 

325 

285 

271 

263 

254 

500 

4641 

3041 

2430 

2163 

1904 

1816 

1734 

1632 

1000 

10371 

6680 

5430 

4672 

4347 

3872 

3739 

3632 

5000 

63578 

41286 

32905 

28620 

26426 

23880 

23114 

22073 

5 

f (1-51 

0.951 

0.761 

0.656 

0.589 

0.548 

0.539 

0.488)  x S In  S'  + 

l (-H 

+ .14 

+.16 

+ .19 

+ .21 

+ .20 

+ .02 

+ .18)  X S 

Table  4 shows  how  the  distribution  method  of  Algorithm  D compares  with 
the  results  of  optimum  distribution  in  Table  3.  It  is  clear  that  Algorithm  D is 
not  very  close  to  the  optimum  when  S and  T become  large;  but  it  is  not  clear 
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Table  4 

INITIAL  RUNS  PROCESSED  DURING  THE  STANDARD  POLYPHASE  MERGE 


s 

T = 3 

T = 4 

T = 5 

T = 6 

T = 7 

T = 8 

T = 9 

T = 10 

10 

36 

24 

19 

17 

15 

14 

13 

12 

20 

90 

62 

49 

44 

41 

37 

34 

33 

50 

294 

194 

167 

143 

134 

131 

120 

114 

100 

714 

459 

393 

339 

319 

312 

292 

277 

500 

4708 

3114 

2599 

2416 

2191 

2100 

2047 

2025 

1000 

10730 

6920 

5774 

5370 

4913 

4716 

4597 

4552 

5000 

64740 

43210 

36497 

32781 

31442 

29533 

28817 

28080 

how  to  do  much  better  than  Algorithm  D without  considerable  complication  in 
such  cases,  especially  if  we  do  not  know  S in  advance.  Fortunately,  we  rarely 
have  to  worry  about  large  S (see  Section  5.4.6),  so  Algorithm  D is  not  too  bad 
in  practice;  in  fact,  it’s  pretty  good. 

Polyphase  sorting  was  first  analyzed  mathematically  by  W.  C.  Carter  [Proc. 
IFIP  Congress  (1962),  62-66].  Many  of  the  results  stated  above  about  optimal 
dummy  run  placement  are  due  originally  to  B.  Sackman  and  T.  Singer  [“A  vector 
model  for  merge  sort  analysis,”  an  unpublished  paper  presented  at  the  ACM  Sort 
Symposium  (November  1962),  21  pages],  Sackman  later  suggested  the  horizontal 
method  of  distribution  used  in  Algorithm  D.  Donald  Shell  [CACM  14  (1971), 
713  -719;  15  (1972),  28]  developed  the  theory  independently,  noted  relation  (10), 
and  made  a detailed  study  of  several  different  distribution  algorithms.  Further 
instructive  developments  and  refinements  have  been  made  by  Derek  A.  Zave 
[SICOMP  6 (1977),  1-39];  some  of  Zave’s  results  are  discussed  in  exercises  15 
through  17.  The  generating  function  (16)  was  first  investigated  by  W.  Burge 
[Proc.  IFIP  Congress  (1971),  1,  454-459]. 

But  what  about  rewind  time?  So  far  we  have  taken  “initial  runs  processed” 
as  the  sole  measure  of  efficiency  for  comparing  tape  merge  strategies.  But  after 
each  of  phases  2 through  6,  in  the  examples  at  the  beginning  of  this  section, 
it  is  necessary  for  the  computer  to  wait  for  two  tapes  to  rewind;  both  the 
previous  output  tape  and  the  new  current  output  tape  must  be  repositioned  at 
the  beginning,  before  the  next  phase  can  proceed.  This  can  cause  a significant 
delay,  since  the  previous  output  tape  generally  contains  a significant  percentage 
of  the  records  being  sorted  (see  the  “pass/phase”  column  in  Table  1).  It  is 
a shame  to  have  the  computer  twiddling  its  thumbs  during  all  these  rewind 
operations,  since  useful  work  could  be  done  with  the  other  tapes  if  we  used  a 
different  merging  pattern. 

A simple  modification  of  the  polyphase  procedure  will  overcome  this  prob- 
lem, although  it  requires  at  least  five  tapes  [see  Y.  Cesari,  Thesis,  U.  of  Paris 
(1968),  25-27,  where  the  idea  is  credited  to  J.  Caron],  Each  phase  in  Caron’s 
scheme  merges  runs  from  T — 3 tapes  onto  another  tape,  while  the  remaining 
two  tapes  are  rewinding. 
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For  example,  consider  the  case  of  six  tapes  and  49  initial  runs.  In  the 
following  tableau,  R denotes  rewinding  during  the  phase,  and  T5  is  assumed  to 
contain  the  original  input: 


Phase 

T1 

T2 

T3 

T4 

T5 

T6 

Write  time 

Rewind  time 

1 

l11 

l17 

I13 

l8 

— 

(R) 

49 

17 

2 

(R) 

l9 

l5 

— 

R 

38 

8x3  = 

24 

49  - 17  = 32 

3 

l6 

l4 

— 

R 

35 

R 

5x3  = 

15 

max(8,  24) 

4 

l2 

— 

R 

54 

R 

34 

4x5  = 

20 

max(13, 15) 

5 

— 

R 

72 

R 

33 

32 

2x7  = 

14 

max(17,  20) 

6 

R 

ll2 

R 

52 

31 

— 

2 x 11  = 

22 

max(ll,  14) 

7 

151 

R 

71 

51 

— 

R 

1 x 15  = 

15 

max(22,  24) 

8 

R 

ll1 

7° 

R 

231 

1 x 23  = 

23 

max(15, 15) 

9 

151 

ll1 

— 

R 

33° 

R 

0 x 33  = 

0 

max(20, 23) 

10 

(15°) 

— 

R 

491 

(R) 

(23°) 

1 x 49  = 

49 

14 

Here  all  the  rewind  time 

is 

essentially  overlapped, 

except  in  phase  9 

(a  “dummy 

phase”  that  prepares  for  the  final  merge),  and  after  the  initial  distribution  phase 
(when  all  tapes  are  rewound).  If  t is  the  time  to  merge  the  number  of  records  in 
one  initial  run,  and  if  r is  the  time  to  rewind  over  one  initial  run,  this  process 
takes  about  182t  + 40r  plus  the  time  for  initial  distribution  and  final  rewind.  The 
corresponding  figures  for  standard  polyphase  using  Algorithm  D are  140f  + 104r, 
which  is  slightly  worse  when  r = slightly  better  when  r = 

Everything  we  have  said  about  standard  polyphase  can  be  adapted  to  Caron’s 
polyphase;  for  example,  the  sequence  an  now  satisfies  the  recurrence 

an—2  an-3  T &n— 4 (20) 

instead  of  (3).  The  reader  will  find  it  instructive  to  analyze  this  method  in  the 
same  way  we  analyzed  standard  polyphase,  since  it  will  enhance  an  understand- 
ing of  both  methods.  (See,  for  example,  exercises  19  and  20.) 

Table  5 gives  statistics  about  Polyphase  Caron  that  are  analogous  to  the 
facts  about  Polyphase  Ordinaire  in  Table  1.  Notice  that  Caron’s  method  actually 
becomes  superior  to  polyphase  on  eight  or  more  tapes,  in  the  number  of  runs 
processed  as  well  as  in  the  rewind  time,  even  though  it  does  (T  — 3)-way  merging 
instead  of  ( T — l)-way  merging! 


Table  5 

APPROXIMATE  BEHAVIOR  OF  CARON’S  POLYPHASE  MERGE  SORTING 


Tapes 

Phases 

Passes 

Pass/phase 

Growth  ratio 

5 

3.556  In  S + 0.158 

1.463  In  S + 1.016 

41% 

1.3247180 

6 

2.616  In  5 - 

0.166 

0.951  In  S+  1.014 

36% 

1.4655712 

7 

2.337  In  5 - 

0.472 

0.781  In  S+  1.001 

33% 

1.5341577 

8 

2.216  In  S - 

0.762 

0.699  In  S + 0.980 

32% 

1.5701473 

9 

2.156  In  S - 

1.034 

0.654  In  S + 0.954 

30% 

1.5900054 

10 

2.124  In  S - 

1.290 

0.626  In  S + 0.922 

29% 

1.6013473 

20 

2.078  In  5 - 

3.093 

0.575  In  S + 0.524 

28% 

1.6179086 
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This  may  seem  paradoxical  until  we  realize  that  a high  order  of  merge  does 
not  necessarily  imply  an  efficient  sort.  As  an  extreme  example,  consider  placing 
one  run  on  T1  and  n runs  on  T2,  T3,  T4,  T5;  if  we  alternately  do  five-way 
merging  to  T6  and  T1  until  T2,  T3,  T4,  T5  are  empty,  the  processing  time  is 
(2 n2  + 3n)  initial  run  lengths,  essentially  proportional  to  S 2 instead  of  5 log  5, 
although  five- way  merging  was  done  throughout. 

Tape  splitting.  Efficient  overlapping  of  rewind  time  is  a problem  that  arises 
in  many  applications,  not  just  sorting,  and  there  is  a general  approach  that  can 
often  be  used.  Consider  an  iterative  process  that  uses  two  tapes  in  the  following 
way: 


T1 

T2 

Phase  1 

Output  1 
Rewind 



Phase  2 

Input  1 
Rewind 

Output  2 
Rewind 

Phase  3 

Output  3 
Rewind 

Input  2 
Rewind 

Phase  4 

Input  3 
Rewind 

Output  4 
Rewind 

and  so  on,  where  “Output  fc”  means  write  the  kth  output  file  and  “Input  Ac” 
means  read  it.  The  rewind  time  can  be  avoided  when  three  tapes  are  used,  as 
suggested  by  C.  Weisert  [ CACM  5 (1962),  102]: 


T1 

T2 

T3 

Phase  1 

Output  1.1 
Output  1.2 
Rewind 

Output  1.3 

— 

Phase  2 

Input  1.1 
Input  1.2 
Rewind 

Output  2.1 
Rewind 

Input  1.3 

Output  2.2 
Output  2.3 

Phase  3 

Output  3.1 
Output  3.2 
Rewind 

Input  2.1 
Rewind 
Output  3.3 

Rewind 
Input  2.2 
Input  2.3 

Phase  4 

Input  3.1 
Input  3.2 
Rewind 

Output  4.1 
Rewind 

Input  3.3 

Rewind 
Output  4.2 
Output  4.3 

and  so  on.  Here  “Output  k.j”  means  write  the  jth  third  of  the  fcth  output 
file,  and  “Input  k.j ” means  read  it.  Virtually  all  of  the  rewind  time  will  be 
eliminated  if  rewinding  is  at  least  twice  as  fast  as  the  read/write  speed.  Such  a 
procedure,  in  which  the  output  of  each  phase  is  divided  between  tapes,  is  called 
“tape  splitting.” 
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R.  L.  McAllester  [CACM  7 (1964),  158-159]  has  shown  that  tape  splitting 
leads  to  an  efficient  way  of  overlapping  the  rewind  time  in  a polyphase  merge. 
His  method  can  be  used  with  four  or  more  tapes,  and  it  does  (T— 2)-way  merging. 

Assuming  once  again  that  we  have  six  tapes,  let  us  try  to  design  a merge 
pattern  that  operates  as  follows,  splitting  the  output  on  each  level,  where  “I” , 
“O”,  and  “R”,  respectively,  denote  input,  output,  and  rewinding: 


Level 

T1 

T2 

T3 

T4 

T5 

T6 

Number  of  runs  output 

7 

I 

I 

I 

I 

R 

O 

u7 

I 

I 

I 

I 

O 

R 

v7 

6 

I 

I 

I 

R 

O 

I 

Uq 

I 

I 

I 

O 

R 

I 

V6 

5 

I 

I 

R 

O 

I 

I 

Us 

I 

I 

O 

R 

I 

I 

Vs 

4 

I 

R 

O 

I 

I 

I 

u4 

I 

O 

R 

I 

I 

I 

v4 

3 

R 

O 

I 

I 

I 

I 

U3 

O 

R 

I 

I 

I 

I 

V3 

2 

O 

I 

I 

I 

I 

R 

U2 

R 

I 

I 

I 

I 

O 

V2 

1 

I 

I 

I 

I 

R 

O 

U\ 

I 

I 

I 

I 

O 

R 

V\ 

0 

I 

I 

I 

R 

O 

I 

Uo 

I 

I 

I 

O 

R 

I 

VO 

In  order  to  end  with  one  run  on  T4  and  all  other  tapes  empty,  we  need  to  have 

vo  = 1, 
uo  + v\  = 0, 
ui  + v2  = u0  + u0, 

u2  + v3  — Ui  + Vi  + Wo  + vq, 

u3  + V4  = U2  + V2  + Ui  + Vi  + u0  + v0, 

V-4  + V5  = Uz  + U3  + U2  + V2  + Ui  + Vi  + M0  + Vo, 

U5  + V6  — U4  + V4  + «3  + V3  + U2  + V2  + Ml  + Vi , 

etc.;  in  general,  the  requirement  is  that 

Un  + Vn+\  = 1 + Un_!  + Un  — 2 + Vn  — 2 + U„_3  + t)„_3  + U„_4  + U„_4  (22) 

for  all  n > 0,  if  we  regard  Uj  = Vj  = 0 for  all  j < 0. 

There  is  no  unique  solution  to  these  equations;  indeed,  if  we  let  all  the  u’s  be 
zero,  we  get  the  usual  polyphase  merge  with  one  tape  wasted!  But  if  we  choose 
un  vn+x,  the  rewind  time  will  be  satisfactorily  overlapped. 

McAllester  suggested  taking 

un  — vn—x  + vn—2  + un_3  + u„_4, 
vn+ 1 — un- 1 + Wn_2  + M„_  3 + Mn_  4, 
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so  that  the  sequence 

(xo,  1 ‘ •*'2i  -Li  ■ %4  ? *^5  ? • • • ) — (no  , Uo  5 ^1  ? ^1  j C2 . U2 , . . . ) 

satisfies  the  uniform  recurrence  zn  = :rn_3  + x„_5  + xn_7  + x„_9.  However,  it 
turns  out  to  be  better  to  let 

vn+l  = Mn-1  + Vn-l  + Mn-2  + vn-2,  , , 

(23) 

^n  — 3 "b  ^n  — 3 ~b  ^n— 4 “b  ^n— 4j 

this  sequence  not  only  leads  to  a slightly  better  merging  time,  it  also  has  the 
great  virtue  that  its  merging  time  can  be  analyzed  mathematically.  McAllester’s 
choice  is  extremely  difficult  to  analyze  because  runs  of  different  lengths  may 
occur  during  a single  phase;  we  shall  see  that  this  does  not  happen  with  (23). 

We  can  deduce  the  number  of  runs  on  each  tape  on  each  level  by  working 
backwards  in  the  pattern  (21),  and  we  obtain  the  following  sorting  scheme: 


Level 

T1 

T2 

T3 

T4 

T5 

T6 

Write  time  Rewind  time 

^23 

l21 

l17 

I10 

— 

l11 

82 

23 

7 

I19 

l17 

I13 

l6 

R 

11344 

4 x 4 = 16 

82  - 23 

l13 

l11 

l7 

— 

46 

R 

6 x 4 = 24 

27 

6 

l10 

Is 

l4 

R 

49 

1844 

3 x 4 = 12 

10 

l6 

l4 

— 

44 

R 

1444 

4 x 4 = 16 

36 

5 

l5 

l3 

R 

4473 

48 

1344 

1x7  = 7 

17 

l2 

— 

73 

R 

45 

44 

3 x 7 = 21 

23 

4 

l1 

R 

73131 

4374 

44 

43 

1 x 13  = 13 

21 

— 

131 

R 

4271 

43 

42 

1 x 13  = 13 

34 

3 

R 

isHq1 

72131 

4X7X 

42 

41 

1 x 19  = 19 

23 

191 

R 

7H31 

71 

41 

— 

1 x 19  = 19 

32 

2 

19131° 

13H91 

7H31 

71 

41 

R 

0 x 31  = 0 

27 

R 

131 

7° 

— 

311 

1 x 31  = 31 

19 

1 

W^l0 

131 

7° 

R 

3U520 

0 x 52  = 0 ) 

| 

W^l0 

191 

131 

— 

52° 

R 

0 x 52  = 0 

> max(36, 31,  23) 

0 

19131° 

191 

131 

R 

52°82° 

31152° 

0 x 82  = 0 J 

1 

(31°) 

(19°) 

— 

821 

(R) 

(31°52°) 

1 x 82  = 82 

0 

Unoverlapped  rewinding  occurs  in  three  places:  when  the  input  tape  T5  is  being 
rewound  (82  units),  during  the  first  half  of  the  level  2 phase  (27  units),  and 
during  the  final  “dummy  merge”  phases  in  levels  1 and  0 (36  units).  So  we  may 
estimate  the  time  as  273f  + 145r;  the  corresponding  amount  for  Algorithm  D, 
268t  + 208r,  is  almost  always  inferior. 

Exercise  23  proves  that  the  run  lengths  output  during  each  phase  are  suc- 
cessively 

4, 4,  7, 13, 19,  31,  52,  82, 133, . . . , (24) 

a sequence  (ti,t2,t3, . . .)  satisfying  the  law 

tn  ^n  — 2 “b  2tn_3  -f-  tn_ 4 (25) 

if  we  regard  tn  = 1 for  n < 0.  We  can  also  analyze  the  optimum  placement 
of  dummy  runs,  by  looking  at  strings  of  merge  numbers  as  we  did  for  standard 
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polyphase  in  Eq.  (8): 

Final 

Level 

T1 

T2 

T3 

T4 

T6 

output  on 

i 

1 

1 

1 

1 

— 

T5 

2 

1 

1 

1 

— 

1 

T4 

3 

21 

21 

2 

2 

1 

T3 

4 

2221 

222 

222 

22 

2 

T2 

5 

23222 

23222 

2322 

23 

222 

T1 

6 

333323222 

33332322 

333323 

3333 

2322 

T6 

n 

An 

Bn 

Cn 

Dn 

En 

T(fc)  ' 

77  + 1 

(KEn  + l)Bn 

(KEn  + l)Cn 

(A"£n+l)Dn 

KEn+ 1 

A' 

T(fc-l) 

5.4.2 


where  An  = A'n  A'r[,  and  A"  consists  of  the  last  un  merge  numbers  of  An.  The  rule 
above  for  going  from  level  n to  level  n + 1 is  valid  for  any  scheme  satisfying  (22). 
When  we  define  the  u’s  and  v’s  by  (23),  the  strings  An, . . , , En  can  be  expressed 
in  the  following  rather  simple  way  analogous  to  (9): 

An  = (W„_1Wn_2Wn_3Wn_4)  + 1, 

Bn  = (W„_1Wn_2W„_3)  + 1, 

Cn  = (Wn^Wn_2)  + 1, 

Dn  — (W„_i)  + 1, 

En  = (W„_2W„_3)  + 1,  (27) 


(28) 


(29) 


where 

Wn  = (Wn_3Wn_4Vbn_2W„_3)  + 1 for  n > 0, 

Wo  = 0,  and  W„  — e for  n < 0. 

From  these  relations  it  is  easy  to  make  a detailed  analysis  of  the  six-tape  case. 

In  general,  when  there  are  T > 5 tapes,  we  let  P — T — 2,  and  we  define  the 
sequences  (un),  (vn)  by  the  rules 

1+4-1  ^n—i  A + * ' * + nn_ r T un_r, 

— V’n  — r—l  T ^n  — r— 1 T * * * T Un_p  T Vn~Pt  for  71  ^ 0, 
where  r = [P/ 2j;  Vo  = 1,  and  u„  — vn  — 0 for  n < 0.  So  if  wn  = un+vn,  we  have 
wn  = wn- 2 + • • ■ + Mn-r  + 2u>n-r-i  + Wn-r-2  + ' ' ' + U)n-p,  for  71  > 0;  (30) 

wo  = 1;  and  wn  — 0 for  n < 0.  The  initial  distribution  on  tapes  for  level 
7i+l  places  wn  + wn_i  + • • • + wn-p+k  runs  on  tape  k.  for  1 < k < P,  and 
wn_i  + • • • + wn-r  on  tape  T;  tape  T — 1 is  used  for  input.  Then  un  runs  are 
merged  to  tape  T while  T — 1 is  being  rewound;  vn  are  merged  to  T — 1 while  T 
is  rewinding;  u„_i  to  T — 1 while  T — 2 is  rewinding;  etc. 

Table  6 shows  the  approximate  behavior  of  this  procedure  when  5 is  not  too 
small.  The  “pass/phase”  column  indicates  approximately  how  much  of  the  entire 
file  is  being  rewound  during  each  half  of  a phase,  and  approximately  how  much 
of  the  file  is  being  written  during  each  full  phase.  The  tape  splitting  method  is 
superior  to  standard  polyphase  on  six  or  more  tapes , and  probably  also  on  five, 
at  least  for  large  S. 
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Table  6 

APPROXIMATE  BEHAVIOR  OF  POLYPHASE  MERGE  WITH  TAPE  SPLITTING 


Tapes 

Phases 

Passes 

Pass/phase 

Growth  ratio 

4 

2.885  In  S + 0.000 

1.443  In  S+  1.000 

50% 

1.4142136 

5 

2.078  In  S + 0.232 

0.929  In  S+  1.022 

45% 

1.6180340 

6 

2.078  In  S - 

0.170 

0.752  In  S + 1.024 

36% 

1.6180340 

7 

1.958  In  S - 

0.408 

0.670  In  S+  1.007 

34% 

1.6663019 

8 

2.008  In  S - 

0.762 

0.624  In  S + 0.994 

31% 

1.6454116 

9 

1 .972  In  S - 

0.987 

0.595  In  S + 0.967 

30% 

1.6604077 

10 

2.013  In  S - 

1.300 

0.580  In  S + 0.941 

29% 

1.6433803 

20 

2.069  In  S - 

3.164 

0.566  In  S + 0.536 

27% 

1.6214947 

When  T = 4 the  procedure  above  would  become  essentially  equivalent  to 
balanced  two-way  merging,  without  overlapping  the  rewind  time,  since  w2n+i 
would  be  0 for  all  n.  So  the  entries  in  Table  6 for  T — 4 have  been  obtained  by 
making  a slight  modification,  letting  v2  = 0,  ui  = 1,  V\  = 0,  u0  = 0,  vq  = 1* 
and  vn+x  = un_i  + wn_i,  un  — u„_ 2 + u„_2  for  n > 2.  This  leads  to  a very 
interesting  sorting  scheme  (see  exercises  25  and  26). 

EXERCISES 

1.  [16]  Figure  69  shows  the  order  in  which  runs  34  through  65  are  distributed  to  five 
tapes  with  Algorithm  D;  in  what  order  are  runs  1 through  33  distributed? 

► 2.  [21]  True  or  false:  After  two  merge  phases  in  Algorithm  D (that  is,  on  the  second 
time  we  reach  step  D6),  all  dummy  runs  have  disappeared. 

► 3.  [22]  Prove  that  the  condition  D[l]  > D[2]  > • • • > D[T]  is  always  satisfied  at  the 
conclusion  of  step  D4.  Explain  why  this  condition  is  important,  in  the  sense  that  the 
mechanism  of  steps  D2  and  D3  would  not  work  properly  otherwise. 

4.  [M20]  Derive  the  generating  functions  (7). 

5.  [HM26]  (E.  P.  Miles,  Jr.,  1960.)  For  all  p > 2,  prove  that  the  polynomial  fp{z)  = 
zp  — zp~1  — ■ ■ ■ — z — 1 has  p distinct  roots,  of  which  exactly  one  has  magnitude  greater 
than  unity.  [Hint:  Consider  the  polynomial  zp+1  — 2 zp  + 1.] 

6.  [HM24]  The  purpose  of  this  exercise  is  to  consider  how  Tables  1,  5,  and  6 were 
prepared.  Assume  that  we  have  a merging  pattern  whose  properties  are  characterized 
by  polynomials  p(z)  and  q(z)  in  the  following  way:  (i)  The  number  of  initial  runs  present 
in  a “perfect  distribution”  requiring  n merging  phases  is  [zn]  p(z)/q(z).  (ii)  The  number 
of  initial  runs  processed  during  these  n merging  phases  is  [zn]p(z) / q(z)2 . (iii)  There 
is  a “dominant  root”  a of  <?(z_1)  such  that  q(a~l)  = 0,  q'  (a-1)  / 0,  p(a~x)  / 0,  and 
q{fl~1)  = 0 implies  that  f3  = a or  |/3|  < |a|. 

Prove  that  there  is  a number  e > 0 such  that,  if  S is  the  number  of  runs  in  a 
perfect  distribution  requiring  n merging  phases,  and  if  pS  initial  runs  are  processed 
during  those  phases,  we  have  n = alnS  + b + 0(S~e)  and  p = clnS  + d + 0(S~e), 
where  , . v 

a = (lna)-1,  b=  -a  In  ( P _ /)  - 1,  c = a — Q , 

\-q'{a  !)7  -g'(«_1) 

d _ (b+l)a-p'(a~1)/p(g-1)  + q"(a~1)/q'{a~1) 

— <j'(a-1) 


286 


SORTING 


5.4.2 


7.  [HM22]  Let  ap  be  the  dominant  root  of  the  polynomial  fp(z)  in  exercise  5.  What 
is  the  asymptotic  behavior  of  ap  as  p — > oo? 

8.  [M20]  (E.  Netto,  1901.)  Let  be  the  number  of  ways  to  express  m as  an 
ordered  sum  of  the  integers  {1,  2, . . . ,p}.  For  example,  when  p = 3 and  m = 5,  there 
are  13  ways,  namely  l + l + l + l+l  = 1 + 1 + 1+2  = 1+1+2+1  = 1 + 1+3  = 1+2+1  + 1 = 

1+2  + 2 = 1+3  + 1 = 2 + 1 + 1 + 1 = 2 + 1 + 2 = 2 + 2 + 1 = 2 + 3 = 3 + 1 + 1 = 3 + 2. 

Show  that  Nm  is  a generalized  Fibonacci  number. 

9.  [M20]  Let  be  the  number  of  sequences  of  m Os  and  Is  such  that  there  are 
no  p consecutive  Is.  For  example,  when  p = 3 and  m = 5 there  are  24  such  sequences: 
00000,  00001,  00010,  00011,  00100,  00101,  00110,  01000,  01001, . . . , 11011.  Show  that 
Km  is  a generalized  Fibonacci  number. 

10.  [M27]  (Generalized  Fibonacci  number  system .)  Prove  that  every  nonnegative 
integer  n has  a unique  representation  as  a sum  of  distinct  pth  order  Fibonacci  numbers 
Fj  , for  j > p,  subject  to  the  condition  that  no  p consecutive  Fibonacci  numbers  are 
used. 


11.  [M24]  Prove  that  the  nth  element  of  the  string  Qx  in  (12)  is  equal  to  the  number 
of  distinct  Fibonacci  numbers  in  the  fifth-order  Fibonacci  representation  of  n - 1.  [See 
exercise  10.] 


► 12.  [Ml 8]  Find  a connection  between  powers  of  the  matrix 


the  perfect  Fibonacci  distributions  ir 


/° 

0 

0 

0 

1 


0 

0 

0 

1 

1/ 


and 


► 13.  [22]  Prove  the  following  rather  odd  property  of  perfect  Fibonacci  distributions: 
When  the  final  output  will  be  on  tape  number  T,  the  number  of  runs  on  each  other 
tape  is  odd-,  when  the  final  output  will  be  on  some  tape  other  than  T,  the  number  of 
runs  will  be  odd  on  that  tape,  and  it  will  be  even  on  the  others.  [See  (1).] 


14.  [M35]  Let  T„(x)  — Tnkxk , where  Tn(x)  is  the  polynomial  defined  in  (16). 

a)  Show  that  for  each  k there  is  a number  n(k)  such  that  Tlk  < T2k  < ■ ■■  < Tni fc)s.  > 
rT(n(k)  + l)k  ^ * * * • 

b)  Given  that  Tniki  < Tnk*  and  n < n,  prove  that  Tnik  < Tnk  for  all  k > k'. 

c)  Prove  that  there  is  a nondecreasing  sequence  ( Mn ) such  that  E n(S)  = min^iE^S) 
when  Mn  < S < Mn+i,  but  E n(S)  > min,>i  E j(S)  when  S > Mn+1.  [See  (19).] 

15.  [M43]  Prove  or  disprove:  E„-i(m)  < E„(m)  implies  that  E n(m)  < E n+i(m)  < 
E„+2 (m)  < ■ ■ ■ . [Such  a result  would  greatly  simplify  the  calculation  of  Table  2.] 


16.  [HM43]  Determine  the  asymptotic  behavior  of  the  polyphase  merge  with  optimum 
distribution  of  dummy  runs. 


17.  [32]  Prove  or  disprove:  There  is  a way  to  disperse  runs  for  an  optimum  polyphase 
distribution  in  such  a way  that  the  distribution  for  5+1  initial  runs  is  formed  by 
adding  one  run  (on  an  appropriate  tape)  to  the  distribution  for  S initial  runs. 

18.  [30]  Does  the  optimum  polyphase  distribution  produce  the  best  possible  merging 
pattern,  in  the  sense  that  the  total  number  of  initial  runs  processed  is  minimized,  if  we 
insist  that  the  initial  runs  be  placed  on  at  most  T — 1 of  the  tapes?  (Ignore  rewind  time.) 

19.  [21]  Make  a table  analogous  to  (1),  for  Caron’s  polyphase  sort  on  six  tapes. 
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20.  [M24]  What  generating  functions  for  Caron’s  polyphase  sort  on  six  tapes  corre- 
spond to  (7)  and  to  (16)?  What  relations,  analogous  to  (9)  and  (27),  define  the  strings 
of  merge  numbers? 

21.  [11]  What  should  appear  on  level  7 in  (26)? 

22.  [M21]  Each  term  of  the  sequence  (24)  is  approximately  equal  to  the  sum  of  the 
previous  two.  Does  this  phenomenon  hold  for  the  remaining  numbers  of  the  sequence? 
Formulate  and  prove  a theorem  about  tn  — tn- 1 — tn- 2. 

► 23.  [29]  What  changes  would  be  made  to  (25),  (27),  and  (28),  if  (23)  were  changed 
to  — Un  — l T Vn—  1 T Un—  2,  — Vn  — 2 d“  Un  — 3 T ^n  — 3 d-  Un— 4 d-  Vn—  4? 

24.  [HM41]  Compute  the  asymptotic  behavior  of  the  tape-splitting  polyphase  proce- 
dure, when  un+i  is  defined  to  be  the  sum  of  the  first  q terms  of  un- 1 d-  ?Jn-i  + • • • + 
un  — p -I-  vn-p , for  various  P = T — 2 and  for  0 < q < 2 P.  (The  text  treats  only  the 
case  q — 2|_.P/2J;  see  exercise  23.) 

25.  [19]  Show  how  the  tape-splitting  polyphase  merge  on  four  tapes,  mentioned  at 
the  end  of  this  section,  would  sort  32  initial  runs.  (Give  a phase-by-phase  analysis  like 
the  82-run  six-tape  example  in  the  text.) 

26.  [M21]  Analyze  the  behavior  of  the  tape-splitting  polyphase  merge  on  four  tapes, 
when  S = 2n  and  when  S = 2n  + 2n_1.  (See  exercise  25.) 

27.  [23]  Once  the  initial  runs  have  been  distributed  to  tapes  in  a perfect  distribution, 
the  polyphase  strategy  is  simply  to  “merge  until  empty”:  We  merge  runs  from  all 
nonempty  input  tapes  until  one  of  them  has  been  entirely  read;  then  we  use  that  tape 
as  the  next  output  tape,  and  let  the  previous  output  tape  serve  as  an  input. 

Does  this  merge-until-empty  strategy  always  sort,  no  matter  how  the  initial  runs 
are  distributed,  as  long  as  we  distribute  them  onto  at  least  two  tapes?  (One  tape  will, 
of  course,  be  left  empty  so  that  it  can  be  the  first  output  tape.) 

28.  [M26]  The  previous  exercise  defines  a rather  large  family  of  merging  patterns. 
Show  that  polyphase  is  the  best  of  them,  in  the  following  sense:  If  there  are  six  tapes, 
and  if  we  consider  the  class  of  all  initial  distributions  (a,  b,  c,  d , e)  such  that  the  merge- 
until-empty  strategy  requires  at  most  n phases  to  sort,  then  a+b+c+d+e  < tn, 
where  tn  is  the  corresponding  value  for  polyphase  sorting  (l). 

29.  [M47]  Exercise  28  shows  that  the  polyphase  distribution  is  optimal  among  all 
merge-until-empty  patterns  in  the  minimum-phase  sense.  But  is  it  optimal  also  in  the 
minimum-pass  sense? 

Let  a be  relatively  prime  to  b,  and  assume  that  a + b is  the  Fibonacci  number  Fn. 
Prove  or  disprove  the  following  conjecture  due  to  R.  M.  Karp:  The  number  of  initial 
runs  processed  during  the  merge-until-empty  pattern  starting  with  distribution  (a,  b) 
is  greater  than  or  equal  to  ((n  — 5)Fn+1  4-  (2 n + 2)Fn)/b.  (The  latter  figure  is  achieved 
when  a = Fn_  1,  b = Fn- 2-) 

30.  [42]  Prepare  a table  analogous  to  Table  2,  for  the  tape-splitting  polyphase  merge. 

31.  [M2 2]  (R.  Kemp.)  Let  Kd(n)  be  the  number  of  n-node  ordered  trees  in  which 
every  leaf  is  at  distance  d from  the  root.  For  example,  A'3(8)  = 7 because  of  the  trees 


Show  that  Kd(n)  is  a generalized  Fibonacci  number,  and  find  a one-to-one  correspon- 
dence between  such  trees  and  the  ordered  partitions  considered  in  exercise  8. 
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*5.4.3.  The  Cascade  Merge 

Another  basic  pattern,  called  the  “cascade  merge,”  was  actually  discovered 
before  polyphase  [B.  K.  Betz  and  W.  C.  Carter,  ACM  National  Meeting  14 
(1959),  Paper  14].  This  approach  is  illustrated  for  six  tapes  and  190  initial  runs 
in  the  following  table,  using  the  notation  developed  in  Section  5.4.2: 


Tl 

T2 

T3 

T4 

T5 

T6 

Initial  runs 

processed 

Pass  1 

j55 

^50 

l41 

^29 

l15 

— 

190 

Pass  2 

— 

*15 

29 

312 

414 

515 

190 

Pass  3 

15s 

144 

123 

92 

*54 

— 

190 

Pass  4 

— 

*154 

291 

411 

501 

55 1 

190 

Pass  5 

1901 

— 

— 

— 

— 

— 

190 

A cascade 

merge, 

like  polyphase, 

starts  out 

with  a 

“perfect 

distribution”  of 

runs  on  tapes,  although  the  rule  for  perfect  distributions  is  somewhat  different 
from  those  in  Section  5.4.2.  Each  line  in  the  table  represents  a complete  pass 
over  all  the  data.  Pass  2,  for  example,  is  obtained  by  doing  a five-way  merge 
from  {Tl,  T2,  T3,  T4,  T5}  to  T6,  until  T5  is  empty  (this  puts  15  runs  of  relative 
length  5 on  T6),  then  a four-way  merge  from  {Tl, T2, T3, T4}  to  T5,  then  a 
three-way  merge  to  T4,  a two-way  merge  to  T3,  and  finally  a one-way  merge 
(a  copying  operation)  from  Tl  to  T2.  Pass  3 is  obtained  in  the  same  way,  first 
doing  a five-way  merge  until  one  tape  becomes  empty,  then  a four-way  merge, 
and  so  on.  (Perhaps  the  present  section  of  this  book  should  be  numbered  5. 4.3. 2.1 
instead  of  5.4.3!) 

It  is  clear  that  the  copying  operations  are  unnecessary,  and  they  could  be 
omitted.  Actually,  however,  in  the  six-tape  case  this  copying  takes  only  a small 
percentage  of  the  total  time.  The  items  marked  with  an  asterisk  in  the  table 
above  are  those  that  were  simply  copied;  only  25  of  the  950  runs  processed  are 
of  this  type.  Most  of  the  time  is  devoted  to  five-way  and  four-way  merging. 


Table  1 

APPROXIMATE  BEHAVIOR  OF  CASCADE  MERGE  SORTING 


Tapes 

Passes  (with  copying) 

Passes  (without  copying) 

Growth  ratio 

3 

2.078  In  S + 0.672 

1.504  In  S + 0.992 

1.6180340 

4 

1.235  In  S + 0.754 

1.102  In  S + 0.820 

2.2469796 

5 

0.946  In  5 + 0.796 

0.897  In  S + 0.800 

2.8793852 

6 

0.796  In  S + 0.821 

0.773  In  S + 0.808 

3.5133371 

7 

0.703  In  S + 0.839 

0.691  In  S + 0.822 

4.1481149 

8 

0.639  In  S + 0.852 

0.632  In  S + 0.834 

4.7833861 

9 

0.592  In  S + 0.861 

0.5871nS  + 0.845 

5.4189757 

10 

0.555  In  S + 0.869 

0.552  In  S + 0.854 

6.0547828 

20 

0.397  In  S + 0.905 

0.397  In  S + 0.901 

12.4174426 

At  first  it  may  seem  that  the  cascade  pattern  is  a rather  poor  choice,  by 
comparison  with  polyphase,  since  standard  polyphase  uses  (T  — l)-way  merging 
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throughout  while  the  cascade  uses  (T  — l)-way,  (T  — 2)-way,  (T  — 3)-way,  etc. 
But  in  fact  it  is  asymptotically  better  than  polyphase,  on  six  or  more  tapes!  As 
we  have  observed  in  Section  5.4.2,  a high  order  of  merge  is  not  a guarantee  of 
efficiency.  Table  1 shows  the  performance  characteristics  of  cascade  merge,  by 
analogy  with  the  similar  tables  in  Section  5.4.2. 

The  “perfect  distributions”  for  a cascade  merge  are  easily  derived  by  working 
backwards  from  the  final  state  (1,0, ...  ,0).  With  six  tapes,  they  are 


Level 

T1 

T2 

T3 

T4 

T5 

0 

1 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

2 

5 

4 

3 

2 

1 

3 

15 

14 

12 

9 

5 

4 

55 

50 

41 

29 

15 

5 

190 

175 

146 

105 

55 

n 

Cn 

dn 

n+1 

"I-  H-  dn  ^ n 

'‘f-  d n 

®"n  "1“  “t- 

It  is  interesting  to  note  that  the  relative  magnitudes  of  these  numbers  appear 
also  in  the  diagonals  of  a regular  (2T  — l)-sided  polygon.  For  example,  the  five 
diagonals  in  the  hendecagon  of  Fig.  73  have  relative  lengths  very  nearly  equal 
to  190,  175,  146,  105,  and  55!  We  shall  prove  this  remarkable  fact  later  in  this 
section,  and  we  shall  also  see  that  the  relative  amount  of  time  spent  in  (T— l)-way 
merging,  (T  — 2)-way  merging,  . . . , 1-way  merging  is  approximately  proportional 
to  the  squares  of  the  lengths  of  these  diagonals. 


Fig.  73.  Geometrical  interpretation  of  cascade  numbers. 

Initial  distribution  of  runs.  When  the  actual  number  of  initial  runs  isn’t 
perfect,  we  can  insert  dummy  runs  as  usual.  A superficial  analysis  of  this  situ- 
ation would  indicate  that  the  method  of  dummy  run  assignment  is  immaterial, 
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Fig.  74.  Efficiency  of  cascade  merge  with  the  distribution  of  Algorithm  D. 


since  cascade  merging  operates  by  complete  passes;  if  we  have  190  initial  runs, 
each  record  is  processed  five  times  as  in  the  example  above,  but  if  there  are  191 
we  must  apparently  go  up  a level  so  that  every  record  is  processed  six  times. 
Fortunately  this  abrupt  change  is  not  actually  necessary;  David  E.  Ferguson  has 
found  a way  to  distribute  initial  runs  so  that  many  of  the  operations  during  the 
first  merge  pass  reduce  to  copying  the  contents  of  a tape.  When  such  copying 
relations  are  bypassed  (by  simply  changing  “logical”  tape  unit  numbers  relative 
to  the  “physical”  numbers  as  in  Algorithm  5. 4. 2D),  we  obtain  a relatively  smooth 
transition  from  level  to  level,  as  shown  in  Fig.  74. 

Suppose  that  (a,  b,  c,  d,  e)  is  a perfect  distribution,  where  a > b > c > d > a. 
By  redefining  the  correspondence  between  logical  and  physical  tape  units,  we 
can  imagine  that  the  distribution  is  actually  (■ e,d,c,b,a ),  with  a runs  on  T5, 
b on  T4,  etc.  The  next  perfect  distribution  is  ( a+b+c+d+e , a+b+c+d,  a+b+c, 
a+b,  a);  and  if  the  input  is  exhausted  before  we  reach  this  next  level,  let  us 
assume  that  the  tapes  contain,  respectively,  (Di,  D2,  D3,  D4,  D5)  dummy  runs, 
where 

D\  < a + b + c + d,  D2  < a + b + c,  D3  < a + b,  D4  < a,  D$  = 0; 

D\  > D2  > D3  > D4  > D5.  (2) 
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We  are  free  to  imagine  that  the  dummy  runs  appear  in  any  convenient  place 
on  the  tapes.  The  first  merge  pass  is  supposed  to  produce  a runs  by  five-way 
merging,  then  b by  four- way  merging,  etc.,  and  our  goal  is  to  arrange  the  dummies 
so  as  to  replace  merging  by  copying.  It  is  convenient  to  do  the  first  merge  pass 
as  follows: 

1.  If  .D4  = a,  subtract  a from  each  of  Di,  D2,  D3,  H4  and  pretend  that 
T5  is  the  result  of  the  merge.  If  £>4  < a,  merge  a runs  from  tapes  T1  through 
T5,  using  the  minimum  possible  number  of  dummies  on  tapes  T1  through  T5  so 
that  the  new  values  of  D 1,  D2,  -D3,  D4  will  satisfy 

D\  ^ b T c T d,  D2  b * c.  D2  + 6,  H4  — 0; 

D\  > D2  > > D4.  (3) 

Thus,  if  D2  was  originally  < b + c,  we  use  no  dummies  from  it  at  this  step,  while 
\{  b + c < D2  < a + b + c we  use  exactly  D2  — b — c of  them. 

2.  (This  step  is  similar  to  step  1,  but  “shifted.”)  If  D3  — b,  subtract  b from 
each  of  D\,  D2,  D3  and  pretend  that  T4  is  the  result  of  the  merge.  If  D3  < b, 
merge  b runs  from  tapes  T1  through  T4,  reducing  the  number  of  dummies  if 
necessary  in  order  to  make 

D\  < c + d,  D2  < c,  D2  = 0;  D\  > D2  > D3. 

3.  And  so  on. 


Table  2 

EXAMPLE  OF  CASCADE  DISTRIBUTION  STEPS 


Add  to  Tl 

Add  to  T2 

Add  to  T3 

Add  to  T4 

Add  to  T5 

“Amount  saved” 

Step  (1,1) 

9 

0 

0 

0 

0 

15  + 14+12  + 5 

Step  (2,2) 

3 

12 

0 

0 

0 

15  + 14  + 9 + 5 

Step  (2,1) 

9 

0 

0 

0 

0 

15  + 14  + 5 

Step  (3,3) 

2 

2 

14 

0 

0 

15  + 12  + 5 

Step  (3,2) 

3 

12 

0 

0 

0 

15  + 9 + 5 

Step  (3,1) 

9 

0 

0 

0 

0 

15  + 5 

Step  (4,4) 

1 

1 

1 

15 

0 

14  + 5 

Step  (4,3) 

2 

2 

14 

0 

0 

12  + 5 

Step  (4,2) 

3 

12 

0 

0 

0 

9 + 5 

Step  (4,1) 

9 

0 

0 

0 

0 

5 

Ferguson’s  method  of  distributing  runs  to  tapes  can  be  illustrated  by  con- 
sidering the  process  of  going  from  level  3 to  level  4 in  (1).  Assume  that  “logical” 
tapes  (Tl, . . . ,T5)  contain  respectively  (5,9, 12, 14, 15)  runs  and  that  we  want 
eventually  to  bring  this  up  to  (55,  50, 41,  29, 15).  The  procedure  can  be  summa- 
rized as  shown  in  Table  2.  We  first  put  nine  runs  on  Tl,  then  (3,12)  on  Tl 
and  T2,  etc.  If  the  input  becomes  exhausted  during,  say,  Step  (3,2),  then  the 
“amount  saved”  is  15  + 9 + 5,  meaning  that  the  five-way  merge  of  15  runs,  the 
two-way  merge  of  9 runs,  and  the  one-way  merge  of  5 runs  are  avoided  by  the 
dummy  run  assignment.  In  other  words,  15  + 9 + 5 of  the  runs  present  at  level 
3 are  not  processed  during  the  first  merge  phase. 
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The  following  algorithm  defines  the  process  in  detail. 


Algorithm  C ( Cascade  merge  sorting  with  special  distribution).  This  algorithm 
takes  initial  runs  and  disperses  them  to  tapes,  one  run  at  a time,  until  the  supply 
of  initial  runs  is  exhausted.  Then  it  specifies  how  the  tapes  are  to  be  merged, 
assuming  that  there  are  T > 3 available  tape  units,  using  at  most  ( T — l)-way 
merging  and  avoiding  unnecessary  one-way  merging.  Tape  T may  be  used  to 
hold  the  input,  since  it  does  not  receive  any  initial  runs.  The  following  tables 
are  maintained: 


ACj],  1 <j<T: 


The  perfect  cascade  distribution  we  have  most  recently 
reached. 


AA[j] , 1 < j < T:  The  perfect  cascade  distribution  we  are  striving  for. 

1 < j < T:  Number  of  dummy  runs  assumed  to  be  present  on  logical 
tape  unit  number  j. 

Mtj],  1 < j < T:  Maximum  number  of  dummy  runs  desired  on  logical  tape 
unit  number  j. 


TAPE  [j] , 1 < j < T:  Number  of  the  physical  tape  unit  corresponding  to  logical 
tape  unit  number  j. 

Cl.  [Initialize.]  Set  A [A]  «-  AA[A]  <-  D[A]  0 for  2 < A < T;  and  set 
A [1]  <-  0,  AA  [1]  <-  1,  D [1]  <-  1.  Set  TAPE  [A:]  <-  k for  1 < A < T.  Finally 
set  * <—  T — 2,  j «—  1,  A <—  1,  l 0,  m t—  1,  and  go  to  step  C5.  (This 
maneuvering  is  one  way  to  get  everything  started,  by  jumping  right  into  the 
inner  loop  with  appropriate  settings  of  the  control  variables.) 


C2.  [Begin  new  level.]  (We  have  just  reached  a perfect  distribution,  and  since 
there  is  more  input  we  must  get  ready  for  the  next  level.)  Increase  l by  1.  Set 
A [A]  <-  AA  [A]  , for  1 < A < T;  then  set  AA  [T  - A]  <-  AA  [T  — A + 1]  -f  A [A] , 
for  A = 1,  2,  ...,  T — 1 in  this  order.  Set  (TAPE  [1]  ,...,  TAPE  [T-l]  ) <- 
(TAPE [T-1],...,TAPE[1]),  and  set  D [A]  <-  AA[A+1]  for  1 < A < T. 
Finally  set  i t—  1. 


C3.  [Begin  zth  sublevel.]  Set  j 4—  i.  (The  variables  i and  j represent  “Step 
(i,jf  in  the  example  shown  in  Table  2.) 

C4.  [Begin  Step  Set  A <-  j and  m <-  A [T  - j - 1] . If  m = 0 and  i = j. 

set  i <r-  T - 2 and  return  to  C3;  if  m = 0 and  i ± j , return  to  C2.  (Variable 
m represents  the  number  of  runs  to  be  written  onto  TAPE  [A] ; m — 0 occurs 
only  when  1 = 1.) 

C5.  [Input  to  TAPE  [A].]  Write  one  run  on  tape  number  TAPE  [A],  and  decrease 
D[A]  by  1.  Then  if  the  input  is  exhausted,  rewind  all  the  tapes  and  go  to 
step  C7. 


C6.  [Advance.]  Decrease  to  by  1.  If  m > 0,  return  to  C5.  Otherwise  decrease  A 
by  1;  if  A > 0,  set  to  «—  A [T  — j — 1]  — A [T  — j]  and  return  to  C5  if  m > 0. 
Otherwise  decrease  j by  1;  if  j > 0,  go  to  C4.  Otherwise  increase  i by  1;  if 
i < T — 1,  return  to  C3.  Otherwise  go  to  C2. 
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Fig.  75.  The  cascade  merge,  with  special  distribution. 

C7.  [Prepare  to  merge.]  (At  this  point  the  initial  distribution  is  complete,  and 
the  AA,  D,  and  TAPE  tables  describe  the  present  states  of  the  tapes.)  Set 
M[fc]  <-  AA[fc  + 1]  for  1 < k < T,  and  set  FIRST  <-  1.  (Variable  FIRST  is 
nonzero  only  during  the  first  merge  pass.) 

C8.  [Cascade.]  If  l — 0,  stop;  sorting  is  complete  and  the  output  is  on  TAPE[1] . 
Otherwise,  for  p = T — 1,  T — 2,  . . . , 1,  in  this  order,  do  a p-way  merge  from 
TAPE[1]  , . . . , TAPE[p]  to  TAPE[p  + 1]  as  follows: 

If  p — 1,  simulate  the  one-way  merge  by  simply  rewinding  TAPE  [2],  then 
interchanging  TAPE[1]  TAPE  [2] . 

Otherwise  if  FIRST  = 1 and  D [p  — 1]  — M [p  — 1]  , simulate  the  p-way  merge 
by  simply  interchanging  TAPE[p]  <-»  TAPE[p+  1],  rewinding  TAPE[p],  and 
subtracting  M[p  — 1]  from  each  of  D [1] , . . . , D [p-1]  , M[l] , . , , , M[p— 1]  . 
Otherwise,  subtract  M[p  — 1]  from  each  of  M[l]  , . , . , M[p  — 1]  . Then  merge 
one  run  from  each  TAPE[j]  such  that  1 < j < p and  D[j]  < M[j] ; subtract 
one  from  each  D[j]  such  that  1 < j < p and  D[j]  > M[j];  and  put  the 
output  run  on  TAPE  [p  + 1] . Continue  doing  this  until  TAPE  [p]  is  empty. 
Then  rewind  TAPE[p]  and  TAPE[p  + 1]  . 

C9.  [Down  a level.]  Decrease  l by  1,  set  FIRST  4-  0,  and  set  (TAPE[1] , . . . , 
TAPE[T])  (TAPE[T] , . . . ,TAPE[1] ).  (At  this  point  all  D’s  and  M’s  are 
zero  and  will  remain  so.)  Return  to  C8.  | 

Steps  C1-C6  of  this  algorithm  do  the  distribution,  and  steps  C7-C9  do  the 
merging;  the  two  parts  are  fairly  independent  of  each  other,  and  it  would  be 
possible  to  store  M [fc]  and  AA  [A-  + 1]  in  the  same  memory  locations. 
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Analysis  of  cascade  merging.  The  cascade  merge  is  somewhat  harder  to 
analyze  than  polyphase,  but  the  analysis  is  especially  interesting  because  so 
many  remarkable  formulas  are  present.  Readers  who  enjoy  discrete  mathematics 
are  urged  to  study  the  cascade  distribution  for  themselves,  before  reading  further, 
since  the  numbers  have  extraordinary  properties  that  are  a pleasure  to  discover. 
We  shall  discuss  here  one  of  the  many  ways  to  approach  the  analysis,  emphasizing 
the  way  in  which  the  results  might  be  discovered. 

For  convenience,  let  us  consider  the  six-tape  case,  looking  for  formulas  that 
generalize  to  all  T.  Relations  (1)  lead  to  the  first  basic  pattern: 


an  ~ an  — ^0Jan, 

^n— 1 

2 — (0) 

(2)an_ 2, 

cn  bn  dn  — 1 

&n— 2 ^n  — 2 = (o)^n 

(2)  2~^~  (4)  ^n— 4? 

(4) 

— cn  Cn—1 

C71  ^n  — 2 ^n— 2 Cn—2  = (o)^n 

(2)  ^n—  2~^~  (4)  ®n— 4 ( 

6\ 

Qjan—6i 

dn  bn  — 1 

dn  &n  — 2 bn  — 2 Cn—2  dn  — 2 ~~  (q)  d-n 

(2)  ^n— 2“l_  (4)  ^n— 4 ( 

g)  ^n— 6 “I"  (3)  ^n  — 8 

Let  A(z)  = En>0  , E{z)  = En>0 

enzn,  and  define  the  polynomials 

fc  k—0  K 

The  result  of  (4)  can  be  summarized  by  saying  that  the  generating  functions 
B(z)  - qi(z)A(z),  C(z)  - q2(z)A(z),  D(z ) - q3(z)A(z),  and  E(z)  - q4(z)A(z) 
reduce  to  finite  sums,  corresponding  to  the  values  of  a_i,  a_2,  a_3, . . . that  appear 
in  (4)  for  small  n but  do  not  appear  in  A(z).  In  order  to  supply  appropriate 
boundary  conditions,  let  us  run  the  recurrence  backwards  to  negative  levels, 
through  level  —8: 


n 

Un 

b-n 

e'n 

dn 

en 

0 

1 

0 

0 

0 

0 

-1 

0 

0 

0 

0 

1 

-2 

1 

-1 

0 

0 

0 

-3 

0 

0 

0 

-1 

2 

-4 

2 

-3 

1 

0 

0 

-5 

0 

0 

1 

-4 

5 

-6 

5 

-9 

5 

-1 

0 

-7 

0 

-1 

6 

-14 

14 

-8 

14 

-28 

20 

-7 

1 
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(On  seven  tapes  the  table  would  be  similar,  with  entries  for  odd  n shifted  right 
one  column.)  The  sequence  ao,  a_ 2l  a_ 4,  ...  = 1, 1,  2,  5, 14, ...  is  a dead  giveaway 
for  computer  scientists,  since  it  occurs  in  connection  with  so  many  recursive 
algorithms  (see,  for  example,  exercise  2.2. 1-4  and  Eq.  2.3.4.4-(i4));  therefore  we 
conjecture  that  in  the  T-tape  case 


a-2n  — 

a~2  n— 1 = 0, 


/ 2n\  1 

V n ) n + 1 ’ 


for  0 < n < T — 2; 
for  0 < n < T - 3. 


(6) 


To  verify  that  this  choice  is  correct,  it  suffices  to  show  that  (6)  and  (4)  yield  the 
correct  results  for  levels  0 and  1.  On  level  1 this  is  obvious,  and  on  level  0 we 
have  to  verify  that 


{ m\  fm+  1 \ / m + 2 \ / m + 3 \ 

Ur°~l  2 r~2+l  4 e ja-6+--- 

'2k\  (-l)*1 


= £ 

k>  0 


m + k 
2k 


k ) k + 1 


= 6. 


m 0 


(7) 


for  0 < m < T — 2.  Fortunately  this  sum  can  be  evaluated  by  standard  tech- 
niques; it  is,  in  fact,  Example  2 in  Section  1.2.6. 

Now  we  can  compute  the  coefficients  of  B(z)  — qi(z)A(z),  etc.  For  example, 
consider  the  coefficient  of  z2m  in  D(z)  — q^{z)A(yz):  It  is 


E( 

fc>0 


3 + m + k 
2m  + 2k 


(-1) 


m-f  k 


«-2  k 


'3  + m + k^  [Zk^  (— 1) 
2m  + 2k 


k 


) 


771  + fc 


gc 

=<-1>m+‘(22HT)' 

by  the  result  of  Example  3 in  Section  1.2.6.  Therefore  we  have  deduced  that 
A(z)  = q0(z)A(z), 

B(z)  = q1(z)A(z)  - q0{z),  C(z ) = q2{z)A(z)  - q^z), 

D(z)  = q3(z)A(z)  - q2{z),  E(z)  = q4{z)A(z)  - q3(z). 

Furthermore  we  have  en+i  = an;  hence  zA(z)  — E(z),  and 

A{z)  = q3(z)/ (q4(z)  - z).  (9) 

The  generating  functions  have  now  been  derived  in  terms  of  the  q polyno- 
mials, and  so  we  want  to  understand  the  q' s better.  Exercise  1.2.9-15  is  useful 
in  this  regard,  since  it  gives  us  a closed  form  that  may  be  written 


(8) 


((V4  — z2  + iz)/2) 


2m+l 


((V^ 


iz)/2) 


2m+l 
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9m  (2  sin#)  = 


(n) 


Everything  simplifies  if  we  now  set  2 = 2 sin#: 

(cos  # + i sin  #)2m+1  + (cos  # - i sin  #)2m+1  _ cos(2m+l)# 

2 cos  # - cos  # 

(This  coincidence  leads  us  to  suspect  that  the  polynomial  qm(z)  is  well  known  in 
mathematics;  and  indeed,  a glance  at  appropriate  tables  will  show  that  qm{z)  is 
essentially  a Chebyshev  polynomial  of  the  second  kind,  namely  {-l)mU2m \z/2) 
in  conventional  notation.) 

We  can  now  determine  the  roots  of  the  denominator  in  (9):  The  equation 
94(2 sin#)  = 2 sin#  reduces  to 

cos  9#  = 2 sin  # cos  # = sin  26. 

We  can  obtain  solutions  to  this  relation  whenever  ±9#  = 2#  + (2 n - \)n:  and 
all  such  # yield  roots  of  the  denominator  in  (9)  provided  that  cos#  ^ 0.  (When 
cos#  — 0,  9to(±2)  = (2m  + 1)  is  never  equal  to  ±2.)  The  following  eight  distinct 
roots  for  94(2)  — z = 0 are  therefore  obtained: 

2 sin  2sinf±7r,  2sin^7r;  2sin^7r,  2sin=f7T,  2sini7r,  2sin^7r,  2sin^7r. 

Since  q4(z)  is  a polynomial  of  degree  8,  this  accounts  for  all  the  roots.  The  first 
three  of  these  values  make  qs(z)  = 0,  so  93(2)  and  94(2)  — z have  a polynomial 
of  degree  three  as  a common  factor.  The  other  five  roots  govern  the  asymptotic 
behavior  of  the  coefficients  of  A(z),  if  we  expand  (9)  in  partial  fractions. 

Considering  the  general  T-tape  case,  let  6k  = (4 A + 1)tt/(4T  - 2).  The 
generating  function  A(z ) for  the  T-tape  cascade  distribution  numbers  takes  the 
form 

4 \ - cos2  #*, 

2T  — 1 ^ 


-T/2<fc<LT/2J 


1 - z/(2  sin  #fe) 


(12) 


(see  exercise  8);  hence 


2T  — 1 


Y!  co&2 

-T/2<fc<[T/2j 


2 sin  #/; 


The  equations  in  (8)  now  lead  to  the  similar  formulas 
4 


= 


dri  — 


2 T - 1 
4 

2T  — 1 
4 

2T  — 1 


Y2  cos  #fc  cos  3 9k 

-T/2<k<  \ T /2J 

Y2  COS  #fc  COS  59k 

-T/2<fc<[T/2J 

Y2  cos#fcCos7#fc 

-T/2<fc<[T/2j 


1 


2 sin  9 k 

1 

2 sin  9 k 

1 

2 sin  9 k 


(13) 


(14) 


and  so  on.  Exercise  9 shows  that  these  equations  hold  for  all  n > 0.  not  only 
for  large  n.  In  each  sum  the  term  for  A:  = 0 dominates  all  the  others,  especially 
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when  n is  reasonably  large;  therefore  the  “growth  ratio”  is 


— - — = ~T  - - + — + 0(T~ 
2 sin  9q  n it  48  T 


(i5) 


Cascade  sorting  was  first  analyzed  by  W.  C.  Carter  [Proc.  IFIP  Congress 
(1962),  62-66],  who  obtained  numerical  results  for  small  T,  and  by  David  E. 
Ferguson  [see  CACM  7 (1964),  297],  who  discovered  the  first  two  terms  in  the 
asymptotic  behavior  (15)  of  the  growth  ratio.  During  the  summer  of  1964, 
R.  W.  Floyd  discovered  the  explicit  form  1/(2  sin#o)  of  the  growth  ratio,  so  that 
exact  formulas  could  be  used  for  all  T.  An  intensive  analysis  of  the  cascade 
numbers  was  independently  carried  out  by  G.  N.  Raney  [Canadian  J.  Math.  18 
(1966),  332-349],  who  came  across  them  in  quite  another  way  having  nothing  to 
do  with  sorting.  Raney  observed  the  “ratio  of  diagonals”  principle  of  Fig.  73, 
and  derived  many  other  interesting  properties  of  the  numbers.  Floyd  and  Raney 
used  matrix  manipulations  in  their  proofs  (see  exercise  6). 


Modifications  of  cascade  sorting.  If  one  more  tape  is  added,  it  is  possible 
to  overlap  nearly  all  of  the  rewind  time  during  a cascade  sort.  For  example, 
we  can  merge  T1-T5  to  T7,  then  T1-T4  to  T6,  then  T1-T3  to  T5  (which  by 
now  is  rewound),  then  T1-T2  to  T4,  and  the  next  pass  can  begin  when  the 
comparatively  short  data  on  T4  has  been  rewound.  The  efficiency  of  this  process 
can  be  predicted  from  the  analysis  of  cascading.  (See  Section  5.4.6  for  further 
information.) 

A “compromise  merge”  scheme,  which  includes  both  polyphase  and  cascade 
as  special  cases,  was  suggested  by  D.  E.  Knuth  in  CACM  6 (1963),  585-587. 
Each  phase  consists  of  (T  - l)-way,  (T  - 2)-way,  . . . , P- way  merges,  where  P 
is  any  fixed  number  between  1 and  T — 1.  When  P = T — 1,  this  is  polyphase, 
and  when  P = 1 it  is  pure  cascade;  when  P — 2 it  is  cascade  without  copy 
phases.  Analyses  of  this  scheme  have  been  made  by  C.  E.  Radke  [IBM  Systems 
J.  5 (1966),  226-247]  and  by  W.  H.  Burge  [Proc.  IFIP  Congress  (1971),  1,  454 
459],  Burge  found  the  generating  function  J2  Tn(x)zn  for  each  (P,  T)  compromise 
merge,  generalizing  Eq.  5.4.2— (16);  he  showed  that  the  best  value  of  P,  from  the 
standpoint  of  fewest  initial  runs  processed  as  a function  of  S as  S -A  00  (using 
a straightforward  distribution  scheme  and  ignoring  rewind  time) , is  respectively 
(2,  3, 3, 4, 4, 4, 3, 3, 4)  for  T — (3,4,5,6,7,8,9,10,11).  These  values  of  P lean 
more  towards  cascade  than  polyphase  as  T increases;  and  it  turns  out  that  the 
compromise  merge  is  never  substantially  better  than  cascade  itself.  On  the  other 
hand,  with  an  optimum  choice  of  levels  and  optimum  distribution  of  dummy 
runs,  as  described  in  Section  5.4.2,  pure  polyphase  seems  to  be  best  of  all  the 
compromise  merges;  unfortunately  the  optimum  distribution  is  comparatively 
difficult  to  implement. 

Tli.  L.  Johnsen  [ BIT  6 (1966),  129-143]  has  studied  a combination  of  bal- 
anced and  polyphase  merging;  a rewind-overlap  variation  of  balanced  merging 
has  been  proposed  by  M.  A.  Goetz  [ Digital  Computer  User’s  Handbook,  edited 
by  M.  Klerer  and  G.  A.  Korn  (New  York:  McGraw-Hill,  1967),  1.311-1.312]; 
and  many  other  hybrid  schemes  can  be  imagined. 
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EXERCISES 

1.  [10]  Using  Table  1,  compare  cascade  merging  with  the  tape-splitting  version  of 
polyphase  described  in  Section  5.4.2.  Which  is  better?  (Ignore  rewind  time.) 

► 2.  [22]  Compare  cascade  sorting  on  three  tapes,  using  Algorithm  C,  to  polyphase 
sorting  on  three  tapes,  using  Algorithm  5.4.2D.  What  similarities  and  differences  can 
you  find? 

3.  [23]  Prepare  a table  that  shows  what  happens  when  100  initial  runs  are  sorted 
on  six  tapes  using  Algorithm  C. 

4.  [M20]  (G.  N.  Raney.)  An  “nth  level  cascade  distribution”  is  a multiset  defined 
as  follows  (in  the  case  of  six  tapes):  {1,0, 0,0,0}  is  a 0th  level  cascade  distribution; 
and  if  {a,b,  c,  d,  e}  is  an  nth  level  cascade  distribution,  {a  + b + c + d + e,  a + b + c + d , 
a + b+c,  a + b,  a}  is  an  (n  + l)st  level  cascade  distribution.  (A  multiset  is  unordered, 
hence  up  to  5!  different  (n  + l)st  level  distributions  can  be  formed  from  a single  nth 
level  distribution.) 

a)  Prove  that  any  multiset  {a,  b,  c,  d,  e}  of  relatively  prime  integers  is  an  nth  level 
cascade  distribution,  for  some  n. 

b)  Prove  that  the  distribution  defined  for  cascade  sorting  is  optimum , in  the  sense 
that,  if  {a,  b,  c,  d,  e}  is  any  nth  level  distribution  with  a>b>c>d>e,  we  have 
a < an,  b < bn,  c < c„,  d < dn,  e < e„,  where  ( an,bn,cn,dn,en ) is  the  distribution 
defined  in  (l). 

► 5.  [20]  Prove  that  the  cascade  numbers  defined  in  (l)  satisfy  the  law 

o-kUn-k  + bkbn-k  + CkCn-k  + dfcdn-fc  + eken-k  = an,  for  0 < k < n. 

[Hint:  Interpret  this  relation  by  considering  how  many  runs  of  various  lengths  are 
output  during  the  fcth  pass  of  a complete  cascade  sort.] 

6.  [ M20 ] Find  a 5 X 5 matrix  Q such  that  the  first  row  of  Qn  contains  the  six-tape 
cascade  numbers  an  bn  cn  dn  e„  for  all  n > 0. 

7 . [ M20 ] Given  that  cascade  merge  is  being  applied  to  a perfect  distribution  of  an 
initial  runs,  find  a formula  for  the  amount  of  processing  saved  when  one-way  merging 
is  suppressed. 

8.  [HM23]  Derive  (12). 

9.  [HM26]  Derive  (14). 

► 10.  [ M28 ] Instead  of  using  the  pattern  (4)  to  begin  the  study  of  the  cascade  numbers, 
start  with  the  identities 

el  = an- 1 = (J)an_l, 

dn  = 2an_i  — en_2  = (1)an-i  — (g)an_3, 

cn  = 3an_i  — 2 — 2en_2  = (j)an_  1 — Q)an_3  — 

etc.  Letting 

t \ - ( m\  (m+  1 \ 3 / m + 2 \ 5 

*•“(*)■- (J*  ( 3 )z  +(  5 )z  -•••, 

express  A(z),  B(z),  etc.  in  terms  of  these  r polynomials. 

11.  [M3 8]  Let 

^(z)  = E(L(m+/)/2J)(-l)rfc/v. 
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Prove  that  the  generating  function  A(z)  for  the  T- tape  cascade  numbers  is  equal  to 
fr-3{z) / fr-i(z),  where  the  numerator  and  denominator  in  this  expression  have  no 
common  factor. 

12.  [MJfO)  Prove  that  Ferguson’s  distribution  scheme  is  optimum,  in  the  sense  that 
no  method  of  placing  the  dummy  runs,  satisfying  (2),  will  cause  fewer  initial  runs  to 
be  processed  during  the  first  pass,  provided  that  the  strategy  of  steps  C7-C9  is  used 
during  this  pass. 

13.  [40]  The  text  suggests  overlapping  most  of  the  rewind  time,  by  adding  an  extra 
tape.  Explore  this  idea.  (For  example,  the  text’s  scheme  involves  waiting  for  T4  to 
rewind;  would  it  be  better  to  omit  T4  from  the  first  merge  phase  of  the  next  pass?) 

*5.4.4.  Reading  Tape  Backwards 

Many  magnetic  tape  units  have  the  ability  to  read  tape  in  the  opposite  direction 
from  which  it  was  written.  The  merging  patterns  we  have  encountered  so  far 
always  write  information  onto  tape  in  the  “forward”  direction,  then  rewind  the 
tape,  read  it  forwards,  and  rewind  again.  The  tape  files  therefore  behave  as 
queues,  operating  in  a first-in-first-out  manner.  Backwards  reading  allows  us  to 
eliminate  both  of  these  rewind  operations:  We  write  the  tape  forwards  and  read 
it  backwards.  In  this  case  the  files  behave  as  stacks,  since  they  are  used  in  a 
last-in-first-out  manner. 

The  balanced,  polyphase,  and  cascade  merge  patterns  can  all  be  adapted  to 
backward  reading.  The  main  difference  is  that  merging  reverses  the  order  of  the 
runs  when  we  read  backwards  and  write  forwards.  If  two  runs  are  in  ascending 
order  on  tape,  we  can  merge  them  while  reading  backwards,  but  this  produces 
descending  order.  The  descending  runs  produced  in  this  way  will  subsequently 
become  ascending  on  the  next  pass;  so  the  merging  algorithms  must  be  capable 
of  dealing  with  runs  in  either  order.  Programmers  who  are  confronted  with 
read-backwards  for  the  first  time  often  feel  like  they  are  standing  on  their  heads! 

As  an  example  of  backwards  reading,  consider  the  process  of  merging  8 initial 
runs,  using  a balanced  merge  on  four  tapes.  The  operations  can  be  summarized 
as  follows: 


T1 

T2 

T3 

T4 

Pass  1 

A1A1A1A1 

AiAiAxAi 

— 

Initial  distribution 

Pass  2 

— 

— 

D2  D2 

D2D2  Merge  to  T3  and  T4 

Pass  3 

a4 

a4 

— 

Merge  to  T1  and  T2 

Pass  4 

— 

— 

d8 

Final  merge  to  T3 

Here  Ar  stands  for  a run  of  relative  length  r that  appears  on  tape  in  ascending 
order,  if  the  tape  is  read  forwards  as  in  our  previous  examples;  Dr  is  the 
corresponding  notation  for  a descending  run  of  length  r.  During  Pass  2 the 
ascending  runs  become  descending:  They  appear  to  be  descending  in  the  input, 
since  we  are  reading  T1  and  T2  backwards.  Then  the  runs  switch  orientation 
again  on  Pass  3. 

Notice  that  the  process  above  finishes  with  the  result  on  tape  T3,  in  de- 
scending order.  If  this  is  bad  (depending  on  whether  the  output  is  to  be  read 
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backwards,  or  to  be  dismounted  and  put  away  for  future  use),  we  could  copy  it 
to  another  tape,  reversing  the  direction.  A faster  way  would  be  to  rewind  T1 
and  T2  after  Pass  3,  producing  A8  during  Pass  4.  Still  faster  would  be  to  start 
with  eight  descending  runs  during  Pass  1,  since  this  would  interchange  all  the 
A’s  and  D' s.  However,  the  balanced  merge  on  16  initial  runs  would  require  the 
initial  runs  to  be  ascending;  and  we  usually  don’t  know  in  advance  how  many 
initial  runs  will  be  formed,  so  it  is  necessary  to  choose  one  consistent  direction. 
Therefore  the  idea  of  rewinding  after  Pass  3 is  probably  best. 

The  cascade  merge  carries  over  in  the  same  way.  For  example,  consider 
sorting  14  initial  runs  on  four  tapes: 


Tl 

T2 

T3 

T4 

Pass  1 

A\AiAiAiA\Ai 

A\AiAiA\Ai 

A1A1A1 

— 

Pass  2 

— 

D2D2 

D3D3D 

Pass  3 

^4-6 

^5 

^3 

— 

Pass  4 

— 

— 

D\4 

3 


Again,  we  could  produce  A14  instead  of  D14,  if  we  rewound  Tl,  T2,  T3  just 
before  the  final  pass.  This  tableau  illustrates  a “pure”  cascade  merge,  in  the 
sense  that  all  of  the  one-way  merges  have  been  performed  explicitly.  If  we  had 
suppressed  the  copying  operations,  as  in  Algorithm  5.4.3C,  we  would  have  been 
confronted  with  the  situation 


l D2D2  D3D3D3 

after  Pass  2,  and  it  would  have  been  impossible  to  continue  with  a three-way 
merge  since  we  cannot  merge  runs  that  are  in  opposite  directions!  The  operation 
of  copying  Tl  to  T2  could  be  avoided  if  we  rewound  Tl  and  proceeded  to  read 
it  forward  during  the  next  merge  phase  (while  reading  T3  and  T4  backwards). 
But  it  would  then  be  necessary  to  rewind  Tl  again  after  merging,  so  this  trick 
trades  one  copy  for  two  rewinds. 

Thus  the  distribution  method  of  Algorithm  5.4. 3C  does  not  work  as  efficient- 
ly for  read-backwards  as  for  read-forwards;  the  amount  of  time  required  jumps 
rather  sharply  every  time  the  number  of  initial  runs  passes  a “perfect”  cas- 
cade distribution  number.  Another  dispersion  technique  can  be  used  to  give  a 
smoother  transition  between  perfect  cascade  distributions  (see  exercise  17). 

Read-backward  polyphase.  At  first  glance  (and  even  at  second  and  third 
glance) , the  polyphase  merge  scheme  seems  to  be  totally  unfit  for  reading  back- 
wards. For  example,  suppose  that  we  have  13  initial  runs  and  three  tapes: 

Tl  T2  T3 

Phase  1 A1A1A1A1A1  A1A1A1A1A1AiAiA1 

Phase  2 — A1A1A1  D2D2 D2D2D2 

Now  we’re  stuck;  we  could  rewind  either  T2  or  T3  and  then  read  it  forwards, 
while  reading  the  other  tape  backwards,  but  this  would  jumble  things  up  and 
we  would  have  gained  comparatively  little  by  reading  backwards. 
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An  ingenious  idea  that  saves  the  situation  is  to  alternate  the  direction  of 
runs  on  each  tape.  Then  the  merging  can  proceed  in  perfect  synchronization: 


T1 

T2 

T3 

Phase  1 

AiD\AiD\Ai 

DiA\D\A\D\A\D\Ai 

— 

Phase  2 

— 

D1A1D1 

D2  A2D2A2  D2 

Phase  3 

A3D3A3 

— 

D2A2 

Phase  4 

A3 

d5a5 

— 

Phase  5 



d5 

d8 

Phase  6 

• 1 1 3 

— 

— 

This  principle  was  mentioned  briefly  by  R.  L.  Gilstad  in  his  original  article  on 
polyphase  merging,  and  he  described  it  more  fully  in  CACM  6 (1963),  220-223. 

The  ADA . . . technique  works  properly  for  polyphase  merging  on  any  num- 
ber of  tapes;  for  we  can  show  that  the  A’s  and  D's  will  be  properly  synchronized 
at  each  phase,  provided  only  that  the  initial  distribution  pass  produces  alter- 
nating A’s  and  D’s  on  each  tape  and  that  each  tape  ends  with  A (or  each  tape 
ends  with  D ):  Since  the  last  run  written  on  the  output  file  during  one  phase  is 
in  the  opposite  direction  from  the  last  runs  used  from  the  input  files,  the  next 
phase  always  finds  its  runs  in  the  proper  orientation.  Furthermore  we  have  seen 
in  exercise  5.4.2-13  that  most  of  the  perfect  Fibonacci  distributions  call  for  an 
odd  number  of  runs  on  one  tape  (the  eventual  output  tape),  and  an  even  number 
of  runs  on  each  other  tape.  If  T1  is  designated  as  the  final  output  tape,  we  can 
therefore  guarantee  that  all  tapes  end  with  an  A run,  if  we  start  T1  with  an  A 
and  let  the  remaining  tapes  start  with  a D.  A distribution  method  analogous  to 
Algorithm  5. 4. 2D  can  be  used,  modified  so  that  the  distributions  on  each  level 
have  T1  as  the  final  output  tape.  (We  skip  levels  1,  T+l,  2T+1,  . . . , since  they 
are  the  levels  in  which  the  initially  empty  tape  is  the  final  output  tape.)  For 
example,  in  the  six-tape  case,  we  can  use  the  following  distribution  numbers  in 
place  of  5.4.2-(i): 


Final  output 


Level 

T1 

T2 

T3 

T4 

T5 

Total 

will  be  on 

0 

1 

0 

0 

0 

0 

1 

T1 

2 

1 

2 

2 

2 

2 

9 

T1 

3 

3 

4 

4 

4 

2 

17 

T1 

4 

7 

8 

8 

6 

4 

33 

T1 

5 

15 

16 

14 

12 

8 

65 

T1 

6 

31 

30 

28 

24 

16 

129 

T1 

8 

61 

120 

116 

108 

92 

497 

T1 

Thus,  T1  always  gets  an  odd  number  of  runs,  while  T2  through  T5  get  the  even 
numbers,  in  decreasing  order  for  flexibility  in  dummy  run  assignment.  Such  a 
distribution  has  the  advantage  that  the  final  output  tape  is  known  in  advance, 
regardless  of  the  number  of  initial  runs  that  happen  to  be  present.  It  turns  out 
(see  exercise  3)  that  the  output  will  always  appear  in  ascending  order  on  T1 
when  this  scheme  is  used. 
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Another  way  to  handle  the  distribution  for  read-backward  polyphase  has 
been  suggested  by  D.  T.  Goodwin  and  J.  L.  Venn  [CACM  7 (1964),  315],  We 
can  distribute  runs  almost  as  in  Algorithm  5. 4. 2D,  beginning  with  a D run  on 
each  tape.  When  the  input  is  exhausted,  a dummy  A run  is  imagined  to  be 
at  the  beginning  of  the  unique  “odd”  tape,  unless  a distribution  with  all  odd 
numbers  has  been  reached.  Other  dummies  are  imagined  at  the  end  of  the 
tapes,  or  grouped  into  pairs  in  the  middle.  The  question  of  optimum  placement 
of  dummy  runs  is  analyzed  in  exercise  5 below. 

Optimum  merge  patterns.  So  far  we  have  been  discussing  various  patterns 
for  merging  on  tape,  without  asking  for  “best  possible”  methods.  It  appears 
to  be  quite  difficult  to  determine  the  optimal  patterns,  especially  in  the  read- 
forward  case  where  the  interaction  of  rewind  time  with  merge  time  is  hard  to 
handle.  On  the  other  hand,  when  merging  is  done  by  reading  backwards  and 
writing  forwards,  all  rewinding  is  essentially  eliminated,  and  it  is  possible  to 
get  a fairly  good  characterization  of  optimal  ways  to  merge.  Richard  M.  Karp 
has  introduced  some  very  interesting  approaches  to  this  problem,  and  we  shall 
conclude  this  section  by  discussing  the  theory  he  has  developed. 

In  the  first  place  we  need  a more  satisfactory  way  to  describe  merging 
patterns,  instead  of  the  rather  mysterious  tape-content  tableaux  that  have  been 
used  above.  Karp  has  suggested  two  ways  to  do  this,  the  vector  representation 
and  the  tree  representation  of  a merge  pattern.  Both  forms  of  representation  are 
useful  in  practice,  so  we  shall  describe  them  in  turn. 

The  vector  representation  of  a merge  pattern  consists  of  a sequence  of  “merge 
vectors”  v/m) . . .?/(1)  y(0\  each  of  which  has  T components.  The  ith-last  merge 
step  is  represented  by  yW  in  the  following  way: 

b)  f +1’  if  tape  number  3 is  an  inPut  to  tlle  merge; 

Vj  ~ ] if  tape  number  j is  not  used  in  the  merge;  (2) 

l —1,  if  tape  number  j gets  the  output  of  the  merge. 

Thus,  exactly  one  component  of  y(l)  is  -1,  and  the  other  components  are  Os  and 
Is.  The  final  vector  y{Q)  is  special;  it  is  a unit  vector,  having  1 in  position  j if  the 
final  sorted  output  appears  on  unit  j , and  0 elsewhere.  These  definitions  imply 
that  the  vector  sum 

v{l)  = y{t)  + + ■ ■ ■ + y(  °)  (3) 

represents  the  distribution  of  runs  on  tape  just  before  the  ith-last  merge  step. 

w/th  v-  runs  on  tape  j.  In  particular,  v tells  how  many  runs  the  initial 
distribution  pass  places  on  each  tape. 

It  may  seem  awkward  to  number  these  vectors  backwards,  with  y(m)  coming 
first  and  y (0)  last,  but  this  peculiar  viewpoint  turns  out  to  be  advantageous  for 
developing  the  theory.  One  good  way  to  search  for  an  optimal  method  is  to  start 
with  the  sorted  output  and  to  imagine  “unmerging”  it  to  various  tapes,  then 
unmerging  these,  etc.,  considering  the  successive  distributions  v^°\  i/1),  v^\ 
in  the  reverse  order  from  which  they  actually  occur  during  the  sorting’ process. 
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Iii  fact  that  is  essentially  the  approach  we  have  taken  already  in  our  analysis  of 
polyphase  and  cascade  merging. 

The  three  merge  patterns  described  in  tabular  form  earlier  in  this  section 
have  the  following  vector  representations: 


Balanced  (T 

= 4 

l,  S ■■ 

= 8) 

Cascade  (T  = 4,  S = 

14) 

Polyphase  (T  = 3,  S 

= 

v(7) 

= ( 4, 

4, 

0, 

0) 

v(io) 

= ( 6,  5,  3, 

0) 

t/12>  : 

= ( 5,  8, 

0) 

y(7) 

= (+1,+1, 

-1, 

0) 

y(10) 

= (+1,  +1,  +1,  ■ 

-1) 

y{12)  : 

= (Tl,  Tl,  - 

■1) 

y{  6) 

= (+1,  +1, 

0, 

-1) 

y(9) 

= (+1,  +1,  +1,  ■ 

-1) 

y{11) 

= ( + 1,  Tl,  - 

-1) 

y{5) 

= (+1,  +1, 

-1, 

0) 

yW 

= (+1,  +1,  +1, 

-1) 

y(w)  ■ 

= ( + 1,  +1,  “ 

-1) 

yW 

= (+1,  +1, 

0, 

-1) 

y(7> 

= (+1,  +1,  — 1, 

0) 

y{9) 

= (Tl,  Tl,  - 

-1) 

yd) 

= (-l, 

0, 

+1) 

+1) 

y(6) 

= (+1,  +1,  —1, 

0) 

yW  : 

= ( + 1,  +1,  “ 

T) 

y{2) 

= ( 0,- 

-1, 

+1, ' 

+1) 

y(V 

= (+i,-i,  o, 

0) 

: 

= (“I.  +1,  Tl) 

y(l) 

— (+1,  +1, 

-1, 

0) 

yW 

= ( — 1 1 +1,  +1, 

+1) 

y(6) 

= ( — 1;  +1)  +1) 

y(0) 

= ( o, 

0, 

1, 

0) 

y{3) 

= ( 0,-1, +1, 

+ 1) 

y(5) 

= ( — 1,  +1)  +1) 

yW 

= ( 0,  0,-1, 

+ 1) 

y(4) 

= ( + 1,  “I,  +1) 

yW 

= (+1,  +1,  +1, 

-1) 

yW 

= (+1, — T +1) 

yW 

© 

o' 

© 

II 

1) 

y(2) 

— (+1,  +1,  - 

-1) 

yW 

— ( — 1)  +1>+1) 

yi°) 

= ( 1,  o, 

0) 

Every  merge  pattern  obviously  has  a vector  representation.  Conversely,  it  is 
easy  to  see  that  the  sequence  of  vectors  y(m^ . . . y ^ y corresponds  to  an  actual 
merge  pattern  if  and  only  if  the  following  three  conditions  are  satisfied: 

i)  y <0)  is  a unit  vector. 

ii)  j/(*)  has  exactly  one  component  equal  to  —1,  all  other  components  equal  to 

0 or  +1,  for  m > i > 1. 

iii)  All  components  of  yW  + • • • + + y(°)  are  nonnegative,  for  m > i > 1. 

The  tree  representation  of  a merge  pattern  gives  another  picture  of  the  same 
information.  We  construct  a tree  with  one  external  leaf  node  for  each  initial 
run,  and  one  internal  node  for  each  run  that  is  merged,  in  such  a way  that  the 
descendants  of  each  internal  node  are  the  runs  from  which  it  was  fabricated. 
Each  internal  node  is  labeled  with  the  step  number  on  which  the  corresponding 
run  was  formed,  numbering  steps  backwards  as  in  the  vector  representation; 
furthermore,  the  line  just  above  each  node  is  labeled  with  the  name  of  the  tape 
on  which  that  run  appears.  For  example,  the  three  merge  patterns  above  have 
the  tree  representations  depicted  in  Fig.  76,  if  we  call  the  tapes  A,  B,  C,  D 
instead  of  Tl,  T2,  T3,  T4. 

This  representation  displays  many  of  the  relevant  properties  of  the  merge 
pattern  in  convenient  form;  for  example,  if  the  run  on  level  0 of  the  tree  (the 
root)  is  to  be  ascending,  then  the  runs  on  level  1 must  be  descending,  those 
on  level  2 must  be  ascending,  etc.;  an  initial  run  is  ascending  if  and  only  if  the 
corresponding  external  node  is  on  an  even-numbered  level.  Furthermore  the  total 
number  of  initial  runs  processed  during  the  merging  (not  including  the  initial 
distribution)  is  exactly  equal  to  the  external  path  length  of  the  tree,  since  each 
initial  run  on  level  k is  processed  exactly  k times. 
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Fig.  76.  Tree  representations  of  three  merge  patterns. 


Every  merge  pattern  has  a tree  representation,  but  not  every  tree  defines  a 
merge  pattern.  A tree  whose  internal  nodes  have  been  labeled  with  the  numbers 
1 through  m,  and  whose  lines  have  been  labeled  with  tape  names,  represents  a 
valid  read-backward  merge  pattern  if  and  only  if 

a)  no  two  lines  adjacent  to  the  same  internal  node  have  the  same  tape  name; 

b)  if  i > j,  and  if  A is  a tape  name,  the  tree  does  not  contain  the  configuration 


© 


c)  if  i < j < k < l,  and  if  A is  a tape  name,  the  tree  does  not  contain 


© © 


both  A 


and  A 

© © 


© © 


both  A 


and  a 


© □ 


(4) 
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Condition  (a)  is  self-evident,  since  the  input  and  output  tapes  in  a merge  must  be 
distinct;  similarly,  (b)  is  obvious.  The  “no  crossover”  condition  (c)  mirrors  the 
last-in-first-out  restriction  that  characterizes  read-backward  operations  on  tape: 
The  run  formed  at  step  k must  be  removed  before  any  runs  formed  previously  on 
that  same  tape;  hence  the  configurations  in  (4)  are  impossible.  It  is  not  difficult 
to  verify  that  any  labeled  tree  satisfying  conditions  (a),  (b),  (c)  does  indeed 
correspond  to  a read-backward  merge  pattern. 

If  there  are  T tape  units,  condition  (a)  implies  that  the  degree  of  each 
internal  node  is  T — 1 or  less.  It  is  not  always  possible  to  attach  suitable  labels 
to  all  such  trees;  for  example,  when  T — 3 there  is  no  merge  pattern  whose  tree 
has  the  shape 


(5) 


This  shape  would  lead  to  an  optimal  merge  pattern  if  we  could  attach  step 
numbers  and  tape  names  in  a suitable  way,  since  it  is  the  only  way  to  achieve 
the  minimum  external  path  length  in  a tree  having  four  external  nodes.  But 
there  is  essentially  only  one  way  to  do  the  labeling  according  to  conditions  (a) 
and  (b),  because  of  the  symmetries  of  the  diagram,  namely, 


and  this  violates  condition  (c).  A shape  that  can  be  labeled  according  to  the 
conditions  above,  using  at  most  T tape  names,  is  called  a T-lifo  tree. 

Another  way  to  characterize  all  labeled  trees  that  can  arise  from  merge 
patterns  is  to  consider  how  all  such  trees  can  be  “grown.”  Start  with  some  tape 
name,  say  A,  and  with  the  seedling 


Step  number  i in  the  tree’s  growth  consists  of  choosing  distinct  tape  names 
B,  B\,  B2, . . . , Bk,  and  changing  the  most  recently  formed  external  node  corre- 
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sponding  to  B 


(7) 


This  “last  formed,  first  grown  on”  rule  explains  how  the  tree  representation  can 
be  constructed  directly  from  the  vector  representation. 

The  determination  of  strictly  optimum  T-tape  merge  patterns  — that  is,  of 
T-lifo  trees  whose  path  length  is  minimum  for  a given  number  of  external  nodes  — 
seems  to  be  quite  difficult.  For  example,  the  following  nonobvious  pattern  turns 
out  to  be  an  optimum  way  to  merge  seven  initial  runs  on  four  tapes,  reading 
backwards: 


A one-way  merge  is  actually  necessary  to  achieve  the  optimum!  (See  exercise  8.) 
On  the  other  hand,  it  is  not  so  difficult  to  give  constructions  that  are  asymptot- 
ically optimal,  for  any  fixed  T. 

Let  KT(n)  be  the  minimum  external  path  length  achievable  in  a T-lifo  tree 
with  n external  nodes.  From  the  theory  developed  in  Section  2. 3. 4. 5,  it  is  not. 
difficult  to  prove  that 


kt(ti)  >nq-  [{(T -iy -n)/{T  ~2)\,  g=[logT_1n],  (9) 

since  this  is  the  minimum  external  path  length  of  any  tree  with  n external  nodes 
and  all  nodes  of  degree  < T.  At  the  present  time  comparatively  few  values  of 


Kr(n)  are  known 

exactly.  Here  are  some  upper  bounds  that 

are 

probably  exact: 

n=  1 

2 

3 

4 5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

K:i(n)  < 0 

2 

5 

9 12 

16 

21 

25 

30 

34 

39 

45 

50 

56 

61  (10) 

K4(n)  < 0 

2 

3 

6 8 

11 

14 

17 

20 

24 

27 

31 

33 

37 

40 

Karp  discovered  that  any  tree  whose  internal  nodes  have  degrees  < T is 
almost  T-lifo,  in  the  sense  that  it  can  be  made  T-lifo  by  changing  some  of  the 
external  nodes  to  one-way  merges.  In  fact,  the  construction  of  a suitable  labeling 
is  fairly  simple.  Let  A be  a particular  tape  name,  and  proceed  as  follows: 
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Step  1.  Attach  tape  names  to  the  lines  of  the  tree  diagram,  in  any  manner 
consistent  with  condition  (a)  above,  provided  that  the  special  name  A is  used 
only  in  the  leftmost  line  of  a branch. 

Step  2.  Replace  each  external  node  of  the  form 


□ 


whenever  B ^ A. 

Step  3.  Number  the  internal  nodes  of  the  tree  in  preorder.  The  result  will  be  a 
labeling  satisfying  conditions  (a),  (b),  and  (c). 

For  example,  if  we  start  with  the  tree 


(n) 


and  three  tapes,  this  procedure  might  assign  labels  as  follows: 


(12) 


It  is  not  difficult  to  verify  that  Karp’s  construction  satisfies  the  “last  formed, 
first  grown  on”  discipline,  because  of  the  nature  of  preorder  (see  exercise  12). 

The  result  of  this  construction  is  a merge  pattern  for  which  all  of  the  initial 
runs  appear  on  tape  A.  This  suggests  the  following  distribution  and  sorting 
scheme,  which  we  may  call  the  preorder  merge: 
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PI.  Distribute  initial  runs  onto  Tape  A until  the  input  is  exhausted.  Let  S be 
the  total  number  of  initial  runs. 

P2.  Carry  out  the  construction  above,  using  a minimum-path-length  (T  - 1)- 
ary  tree  with  S external  nodes,  obtaining  a T-lifo  tree  whose  external  path 
length  is  within  S of  the  lower  bound  in  (9). 

P3.  Merge  the  runs  according  to  this  pattern.  | 

This  scheme  will  produce  its  output  on  any  desired  tape.  But  it  has  one  serious 
flaw— -does  the  reader  see  what  will  go  wrong?  The  problem  is  that  the  merge 
pattern  requires  some  of  the  runs  initially  on  tape  A to  be  ascending,  and  some  to 
be  descending,  depending  on  whether  the  corresponding  external  node  appears 
on  an  odd  or  an  even  level.  This  problem  can  be  resolved  without  knowing  S 
in  advance,  by  copying  runs  that  should  be  descending  onto  an  auxiliary  tape 
or  tapes,  just  before  they  are  needed.  Then  the  total  amount  of  processing,  in 
terms  of  initial  run  lengths,  comes  to 

S logT_!  S + O(S).  (13) 

Thus  the  preorder  merge  is  definitely  better  than  polyphase  or  cascade,  as 
S —>  00;  indeed,  it  is  asymptotically  optimum,  since  (9)  shows  that  5,logT_1  5 + 
O(S)  is  the  best  we  could  ever  hope  to  achieve  on  T tapes.  On  the  other 
hand,  for  the  comparatively  small  values  of  S that  usually  arise  in  practice,  the 
preorder  merge  is  rather  inefficient.;  polyphase  or  cascade  methods  are  simpler 
and  faster,  when  S is  reasonably  small.  Perhaps  it  will  be  possible  to  invent  a 
simple  distribution-and-merge  scheme  that  is  competitive  with  polyphase  and 
cascade  for  small  5,  and  that  is  asymptotically  optimum  for  large  S. 

The  second  set  of  exercises  below  shows  how  Karp  has  formulated  the 
question  of  read-forward  merging  in  a similar  way.  The  theory  turns  out  to 
be  rather  more  complicated  in  this  case,  although  some  very  interesting  results 
have  been  discovered. 

EXERCISES  — First  Set 

1.  [17]  It  is  often  convenient,  during  read-forward  merging,  to  mark  the  end  of  each 
run  on  tape  by  including  an  artificial  sentinel  record  whose  key  is  +00.  How  should 
this  practice  be  modified,  when  reading  backwards? 

2.  [20]  Will  the  columns  of  an  array  like  (1)  always  be  nondecreasing,  or  is  there  a 
chance  that  we  will  have  to  “subtract”  runs  from  some  tape  as  we  go  from  one  level  to 
the  next? 

► 3.  [20]  Prove  that  when  read-backward  polyphase  merging  is  used  with  the  perfect 

distributions  of  (1),  we  will  always  obtain  an  A run  on  tape  T1  when  sorting  is  complete, 
if  T1  originally  starts  with  ADA  . . . and  T2  through  T5  start  with  DAD 

4.  [M22]  Is  it  a good  idea  to  do  read-backward  polyphase  merging  after  distributing 
all  runs  in  ascending  order,  imagining  all  the  D positions  to  be  initially  filled  with 
dummies? 

► 5.  [23]  What  formulas  for  the  strings  of  merge  numbers  replace  (8),  (9),  (10),  and 
(11)  of  Section  5.4.2,  when  read-backward  polyphase  merging  is  used?  Show  the 


5.4.4 


READING  TAPE  BACKWARDS  309 


merge  numbers  for  the  fifth  level  distribution  on  six  tapes,  by  drawing  a diagram 
like  Fig.  71(a). 

6.  [0 7]  What  is  the  vector  representation  of  the  merge  pattern  whose  tree  represen- 
tation is  (8)? 

7.  [16]  Draw  the  tree  representation  for  the  read-backward  merge  pattern  defined 
by  the  following  sequence  of  vectors: 

v(33)  = ( 20,  9 , 5) 

y(33)  = (+1)_i)+1) 

y(32)  = (+1,+1,-1) 
y(31)  = (+1,+1,-1) 
y(30)  = (+i,+i,-i) 
y(29)  = (+1,-1, +i) 
y(28)  = (-i,+i,+i) 
y(27)  = (+1,-1, +i) 
y(26>  = (+1,-1,  +1) 
y(25)  = (+i,  +i,-i) 
y(24)  = (+1,-1, +i) 

y(23)  = (+1,-1, +i) 
y(22]  = (+lj_lj+1) 

y(ai)  = (-i,+i,  +i) 
ym  = (+i,  +i, -i) 
y(19)  = (-1,  +i,  +i) 
y(18)  = (+i,+i,-i) 
y{17)  = (+i,+i,  — i) 

8.  [23]  Prove  that  (8)  is  an  optimum  way  to  merge,  reading  backwards,  when  S = 7 
and  T = 4,  and  that  all  methods  that  avoid  one-way  merging  are  inferior. 

9.  [M22]  Prove  the  lower  bound  (g). 

10.  [41]  Prepare  a table  of  the  exact  values  of  Kr{n),  using  a computer. 

► 11.  [20]  True  or  false:  Any  read-backward  merge  pattern  that  uses  nothing  but 
(T  — l)-way  merging  must  always  have  the  runs  alternating  AD  AD ...  on  each  tape; 
it  will  not  work  if  two  adjacent  runs  appear  in  the  same  order. 

12.  [22]  Prove  that  Karp’s  preorder  construction  always  yields  a labeled  tree  satisfy- 
ing conditions  (a),  (b),  and  (c). 

13.  [16]  Make  (12)  more  efficient,  by  removing  as  many  of  the  one-way  merges  as 
possible  so  that  preorder  still  gives  a valid  labeling  of  the  internal  nodes. 

14.  [40]  Devise  an  algorithm  that  carries  out  the  preorder  merge  without  explicitly 
representing  the  tree  in  steps  P2  and  P3,  using  only  O(logS)  words  of  memory  to 
control  the  merging  pattern. 

15.  [M3 9]  Karp’s  preorder  construction  in  the  text  yields  trees  with  one-way  merges  at 
several  terminal  nodes.  Prove  that  when  T = 3 it  is  possible  to  construct  asymptotically 
optimal  3-lifo  trees  in  which  two-way  merging  is  used  throughout. 

In  other  words,  let  Kr(n)  be  the  minimum  external  path  length  over  all  T-lifo 
trees  with  n external  nodes,  such  that  every  internal  node  has  degree  T — 1.  Prove  that 
Ks(n)  = nlgn  + O(n). 

16.  [M46]  In  the  notation  of  exercise  15,  is  Kt{ti)  = nlogr_j  n + 0(n)  for  all  T > 3, 
when  n = 1 (modulo  T — 2)1 


y(16)  = (+1,+1,-1) 
2/(15)  = (+1,  +1, -1) 
»(14)  = (+l,-l,+l) 

y(13)  = (+1,-1, +1) 
y<12)  = (-!,+!,+!) 
y(11)  = (+1,  +1, -1) 
„(10>  = (+!,+!,— !) 
y(9)  =(+i,-i,+i) 
yw  = (+1,  +1, -1) 

y(7)  =(+l,+l,-l) 

y(6)  = (+1,  +1, -1) 
r/(5)  = (—1,  +1,  +1) 

yw  = (+1, -1,  +1) 

y{3)  = (—1,  +1,  +1) 

yw  = (+1, -1,  +1) 
yw  =(-l,+l,+l) 
2/(0)  =(  1,  0,  0) 
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► 17.  [28]  (Richard  D.  Pratt.)  To  achieve  ascending  order  in  a read-backward  cascade 
merge,  we  could  insist  on  an  even  number  of  merging  passes;  this  suggests  a technique 
of  initial  distribution  that  is  somewhat  different  from  Algorithm  5.4.3C. 

a)  Change  5.4.3- (i)  so  that  it  shows  only  the  perfect  distributions  that  require  an 
even  number  of  merging  passes. 

b)  Design  an  initial  distribution  scheme  that  interpolates  between  these  perfect  dis- 
tributions. (Thus,  if  the  number  of  initial  runs  falls  between  perfect  distributions, 
it  is  desirable  to  merge  some,  but  not  all,  of  the  runs  twice,  in  order  to  reach  a 
perfect  distribution.) 

► 18.  [M38]  Suppose  that  T tape  units  are  available,  for  some  T > 3,  and  that  T1 
contains  N records  while  the  remaining  tapes  are  empty.  Is  it  possible  to  reverse  the 
order  of  the  records  on  T1  in  fewer  than  Q(N  log  N)  steps,  without  reading  backwards? 
(The  operation  is,  of  course,  trivial  if  backwards  reading  is  allowed.)  See  exercise 
5.2.5-14  for  a class  of  such  algorithms  that  do  require  order  NlogN  steps. 

EXERCISES  — Second  Set 

The  following  exercises  develop  the  theory  of  tape  merging  on  read-forward  tapes;  in 
this  case  each  tape  acts  as  a queue  instead  of  as  a stack.  A merge  pattern  can  be 
represented  as  a sequence  of  vectors  y(m) . . . exactly  as  in  the  text,  but  when 

we  convert  the  vector  representation  to  a tree  representation  we  change  “last  formed, 
first  grown  on”  to  “ first  formed,  first  grown  on.”  Thus  the  invalid  configurations  (4) 
would  be  changed  to 


A tree  that  can  be  labeled  so  as  to  represent  a read-forward  merge  on  T tapes  is  called 
T-fifo,  analogous  to  the  term  “T-lifo”  in  the  read-backward  case. 

When  tapes  can  be  read  backwards,  they  make  very  good  stacks.  But  unfortu- 
nately they  don’t  make  very  good  general-purpose  queues.  If  we  randomly  write  and 
read,  in  a first-in-first-out.  manner,  we  waste  a lot  of  time  moving  from  one  part  of  the 
tape  to  another.  Even  worse,  we  will  soon  run  off  the  end  of  the  tape!  We  run  into  the 
same  problem  as  the  queue  overrunning  memory  in  2.2.2-(4)  and  (5),  but  the  solution 
in  2.2.2-(6)  and  (7)  doesn’t  apply  to  tapes  since  they  aren’t  circular  loops.  Therefore 
we  shall  call  a tree  strongly  T -fifo  if  it  can  be  labeled  so  that  the  corresponding  merge 
pattern  makes  each  tape  follow  the  special  queue  discipline  “write,  rewind,  read  all. 
rewind;  write,  rewind,  read  all,  rewind;  etc.” 

► 19.  [22]  (R.  M.  Karp.)  Find  a binary  tree  that  is  not  3-fifo. 

► 20.  [22]  Formulate  the  condition  “strongly  T-fifo”  in  terms  of  a fairly  simple  rule 
about  invalid  configurations  of  tape  labels,  analogous  to  (4'). 

21.  [18]  Draw  the  tree  representation  for  the  read- forwards  merge  pattern  defined  by 
the  vectors  in  exercise  7.  Is  this  tree  strongly  3-fifo? 

22.  [28]  (R.  M.  Karp.)  Show  that  the  tree  representations  for  polyphase  and  cascade 
merging  with  perfect  distributions  are  exactly  the  same  for  both  the  read-backward 
and  the  read-forward  case,  except  for  the  numbers  that  label  the  internal  nodes.  Find 
a larger  class  of  vector  representations  of  merging  patterns  for  which  this  is  true. 


5.4.5 


THE  OSCILLATING  SORT  311 


23.  [24]  (R.  M.  Karp.)  Let  us  say  that  a segment  ■y^q'1 . . . y ^ of  a merge  pattern  is  a 
stage  if  no  output  tape  is  subsequently  used  as  an  input  tape  — that  is,  if  there  do  not 
exist  i,  j,  k with  q > i > k > r,  y = —1,  and  y ^ = +1.  The  purpose  of  this  exercise 
is  to  prove  that  cascade  merge  minimizes  the  number  of  stages,  over  all  merge  patterns 
having  the  same  number  of  tapes  and  initial  runs. 

It  is  convenient  to  define  some  notation.  Let  us  write  v -A  w if  v and  w are  T- 
vectors  such  that  w reduces  to  v in  the  first  stage  of  some  merge  pattern.  (Thus  there 
is  a merge  pattern  yt'm'> . . . j/<0)  such  that  is  a stage,  w = y ^ + ■ ■ ■ + r/(0\ 

and  v = yl'l)  + •••-)-  f/°f)  Let  us  write  v < w if  u and  w are  T-vectors  such  that 
the  sum  of  the  largest  k elements  of  v is  < the  sum  of  the  largest  k elements  of  w,  for 
l<k<T.  Thus,  for  example,  (2, 1,  2,  2, 2, 1)  X (1, 2, 3, 0, 3, 1),  since  2 < 3,  2+2  < 3+3, 
....  2 + 2 + 2 + 2 + 1 + 1 < 3 + 3 + 2 + 1 + 1+0.  Finally,  if  v — (tq, . . . , r+),  let 
G(v)  = (st,  st-2,  st-3,  ■ ■ . ,si,0)  where  sk  is  the  sum  of  the  largest  k elements  of  v. 

a)  Prove  that  v — > C(v). 

b)  Prove  that  v < w implies  C(v ) + C(w). 

c)  Assuming  the  result  of  exercise  24,  prove  that  cascade  merge  minimizes  the  number 
of  stages. 

24.  [ M35 } In  the  notation  of  exercise  23,  prove  that  v — 1 w implies  w + C(v). 

25.  [M36]  (R.  M.  Karp.)  Let  us  say  that  a segment  r/'3). . . y ^ of  a merge  pattern 
is  a phase  if  no  tape  is  used  both  for  input  and  for  output  — that  is,  if  there  do  not 
exist  t,  j,  k with  q>i>r,q>k>r,  j/)*'  = +1,  and  = —1.  The  purpose  of  this 
exercise  is  to  investigate  merge  patterns  that  minimize  the  number  of  phases.  We  shall 
write  v =>  w if  w can  be  reduced  to  v in  one  phase  (a  similar  notation  was  introduced 
in  exercise  23);  and  we  let 

Dk(v)  = (sk  + tk+ 1,  Sfc+ffc+2,  . . . , sk  + tr,  0,  . . . , 0), 

where  tj  denotes  the  jth  largest  element  of  v and  sk  — ti  + • • • + tk. 

a)  Prove  that  v =>  Dk(v)  for  1 < k < T. 

b)  Prove  that  v + w implies  Dk(v)  + Dk(w),  for  1 < k < T. 

c)  Prove  that  v =+  w implies  w + Dk(v),  for  some  k,  1 < k < T. 

d)  Consequently,  a merge  pattern  that  sorts  the  maximum  number  of  initial  runs  on 
T tapes  in  q phases  can  be  represented  by  a sequence  of  integers  k\  L’2  . . . kq,  such 
that  the  initial  distribution  is  Dkq(. . . (Dk2(Dkl(u)))  . . . ),  where  u = (1,0, .. . ,0). 
This  minimum-phase  strategy  has  a strongly  T-fifo  representation,  and  it  also 
belongs  to  the  class  of  patterns  in  exercise  22.  When  T = 3 it  is  the  polyphase 
merge,  and  for  T = 4,  5,  6,  7 it  is  a variation  of  the  balanced  merge. 

26.  [ M46 ) (R.  M.  Karp.)  Is  the  optimum  sequence  k\  L’2  ...  kq  mentioned  in  exercise  25 
equal  to  1 \T / 2]  [T/2J  [T/2]  [T/2J  . . . , for  all  T > 4 and  all  sufficiently  large  q ? 

*5.4.5.  The  Oscillating  Sort 

A somewhat  different  approach  to  merge  sorting  was  introduced  by  Sheldon 
Sobel  in  JACM  9 (1962),  372-375.  Instead  of  starting  with  a distribution  pass 
where  all  the  initial  runs  are  dispersed  to  tapes,  he  proposed  an  algorithm  that 
oscillates  back  and  forth  between  distribution  and  merging,  so  that  much  of  the 
sorting  takes  place  before  the  input  has  been  completely  examined. 


312  SORTING 


5.4.5 


Suppose,  for  example,  that  there  are  five  tapes  available  for  merging.  Sobel’s 
method  would  sort  16  initial  runs  as  follows: 


Operation 

T1 

T2 

T3 

T4 

T5 

Cost 

Phase  1 

Distribute 

Ai 

Ai 

A! 

Ai 

— 

4 

Phase  2 

Merge 

— 

— 

— 

d4 

4 

Phase  3 

Distribute 

— 

Tli 

A 1 

Ai 

D4Ai 

4 

Phase  4 

Merge 

d4 

— 

— 

— 

d4 

4 

Phase  5 

Distribute 

D4Ai 

— 

Ai 

D4A\ 

4 

Phase  6 

Merge 

d4 

D4 

— 

— 

d4 

4 

Phase  7 

Distribute 

D4A\ 

D4A\ 

— 

A! 

D4A\ 

4 

Phase  8 

Merge 

d4 

d4 

d4 

*4* ■ 

d4 

4 

Phase  9 

Merge 

— 

— 

^4-16 

— 

16 

Here,  as  in  Section  5.4.4,  we  use  Ar  and  Dr  to  stand  respectively  for  ascending 
and  descending  runs  of  relative  length  r.  The  method  begins  by  writing  an  initial 
run  onto  each  of  four  tapes,  and  merges  them  (reading  backwards)  onto  the  fifth 
tape.  Distribution  resumes  again,  this  time  cyclically  shifted  one  place  to  the 
right  with  respect  to  the  tapes,  and  a second  merge  produces  another  run  D4. 
When  four  D4s  have  been  formed  in  this  way,  an  additional  merge  creates  Aw. 
We  could  go  on  to  create  three  more  Ai6’s,  merging  them  into  a D6 4,  and  so  on 
until  the  input  is  exhausted.  It  isn’t  necessary  to  know  the  length  of  the  input 
in  advance. 

When  the  number  of  initial  runs,  S,  is  4m,  it  is  not  difficult  to  see  that  this 
method  processes  each  record  exactly  m + 1 times:  once  during  the  distribution, 
and  m times  during  a merge.  When  5 is  between  4m_1  and  4m,  we  could  assume 
that  dummy  runs  are  present,  bringing  5 up  to  4m;  hence  the  total  sorting  time 
would  essentially  amount  to  [log4  S]  + 1 passes  over  all  the  data.  This  is  just 
what  would  be  achieved  by  a balanced  sort  on  eight  tapes;  in  general,  oscillating 
sort  with  T work  tapes  is  equivalent  to  balanced  merging  with  2(T-1)  tapes, 
since  it  makes 

[logT_x  S]  + 1 

passes  over  the  data.  When  5 is  a power  of  T — 1,  this  is  the  best  any  T-tape 
method  could  possibly  do,  since  it  achieves  the  lower  bound  in  Eq.  5.4.4-(g).  On 
the  other  hand,  when  5 is 

(T  — l)m~1  + 1, 

just  one  higher  than  a power  of  T — 1,  the  method  wastes  nearly  a whole  pass. 

Exercise  2 shows  how  to  eliminate  part  of  this  penalty  for  non-perfect- 
powers  S,  by  using  a special  ending  routine.  A further  refinement  was  discovered 
in  1966  by  Dennis  L.  Bencher,  who  called  his  procedure  the  “criss-cross  merge” 
[see  H.  Wedekind,  Datenorganisation  (Berlin:  W.  de  Gruyter,  1970),  164-166; 
see  also  U.S.  Patent  3540000  (1970)].  The  main  idea  is  to  delay  merging  until 
more  knowledge  of  S has  been  gained.  We  shall  discuss  a slightly  modified  form 
of  Bencher’s  original  scheme. 
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This  improved  oscillating  sort  proceeds  as  follows: 


Operation 

T1 

T2 

T3 

T4 

T5 

Cost 

Phase  1 

Distribute 

— 

Aa 

Tli 

Aa 

Tli 

4 

Phase  2 

Distribute 

— 

Aa 

AaAi 

AaAi 

AaAi 

3 

Phase  3 

Merge 

d4 

— 

Aa 

Aa 

Aa 

4 

Phase  4 

Distribute 

D4A1 

— 

Aa 

AaAi 

AaAa 

3 

Phase  5 

Merge 

d4 

Da 

— - 

Aa 

A\ 

4 

Phase  6 

Distribute 

D4AI 

D4A\ 

— 

Aa 

AaAa 

3 

Phase  7 

Merge 

da 

Da 

D4 

— 

Aa 

4 

Phase  8 

Distribute 

d4ai 

D4Ai 

D4A1 

— . 

Aa 

3 

Phase  9 

Merge 

Da 

Da 

Da 

Da 

— 

4 

We  do  not  merge  the  D^s  into  an  Aie  at  this  point 
to  be  exhausted);  only  after  building  up  to 

(unless 

the  input 

happens 

Phase  15 

Merge 

D4D4 

D4D4 

D4  D4 

Da 

— 

4 

will  we  get 

Phase  16 

Merge 

Da 

Da 

Da 

— 

Aa6 

16 

The  second 

will  occur  after  three  more 

Da's  have  been  made, 

Phase  22 

Merge 

D4D4 

D4D4 

Da 

— 

Aa6Da 

4 

Phase  23 

Merge 

D4 

D4 

^16 

Aa6 

16 

and  so  on  (compare  with  Phases  1-5).  The  advantage  of  Bencher’s  scheme  can  be 
seen  for  example  if  there  are  only  five  initial  runs:  Oscillating  sort  as  modified 
in  exercise  2 would  do  a four-way  merge  (in  Phase  2)  followed  by  a two-way 
merge,  for  a total  cost  of  4 + 4 + 1 + 5 = 14,  while  Bencher’s  scheme  would  do 
a two-way  merge  (in  Phase  3)  followed  by  a four-way  merge,  for  a total  cost  of 
4+l+2+5=  12.  Both  methods  also  involve  a small  additional  cost,  namely 
one  unit  of  rewind  before  the  final  merge. 

A precise  description  of  Bencher’s  method  appears  in  Algorithm  B below. 
Unfortunately  it  seems  to  be  a procedure  that  is  harder  to  understand  than  to 
code;  it  is  much  easier  to  explain  the  technique  to  a computer  than  to  a computer 
scientist!  This  is  partly  because  it  is  an  inherently  recursive  method  that  has 
been  expressed  in  iterative  form  and  then  optimized  somewhat;  the  reader  may 
find  it  necessary  to  trace  through  the  operation  of  this  algorithm  several  times 
before  discovering  what  is  really  going  on. 

Algorithm  B ( Oscillating  sort  with  “criss-cross”  distribution).  This  algorithm 
takes  initial  runs  and  disperses  them  to  tapes,  occasionally  interrupting  the 
distribution  process  in  order  to  merge  some  of  the  tape  contents.  The  algorithm 
uses  P- way  merging,  assuming  that  T = P + 1 > 3 tape  units  are  available  — 
not  counting  the  unit  that  may  be  necessary  to  hold  the  input  data.  The  tape 
units  must  allow  reading  in  both  forward  and  backward  directions,  and  they  are 
designated  by  the  numbers  0, 1, . . . , P.  The  following  tables  are  maintained: 
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D [ j] , 0 < j < P:  Number  of  dummy  runs  assumed  to  be  present  at  the  end  of 
tape  j. 

ALU!  , 0 < l < L,  Here  L is  a number  such  that  at  most  PL+1  initial  runs  will 
0 < j < P be  input.  When  A [Z,j]  = k > 0,  a run  of  nominal  length 
Pk  is  present  on  tape  j,  corresponding  to  “level  l ” of  the 
algorithm’s  operation.  This  run  is  ascending  if  k is  even, 
descending  if  k is  odd.  When  A [/,  j]  < 0,  level  l does  not  use 
tape  j. 

The  statement  “Write  an  initial  run  on  tape  j”  is  an  abbreviation  for  the 

following  operations: 

Set  A[U']  «-  0.  If  the  input  is  exhausted,  increase  D[j]  by  1;  otherwise 
write  an  initial  run  (in  ascending  order)  onto  tape  j. 

The  statement  “Merge  to  tape  j”  is  an  abbreviation  for  the  following  operations: 
If  D[i]  > 0 for  all  i ^ j,  decrease  D [i]  by  1 for  all  i 7^  j and  increase  D[j] 
by  1.  Otherwise  merge  one  run  to  tape  j,  from  all  tapes  i ^ j such  that 
D[i]  = 0,  and  decrease  D[i]  by  1 for  all  other  t/j. 


Fig.  77.  Oscillating  sort,  with  a “criss-cross”  distribution. 


Bl.  [Initialize.]  Set  D[j]  <-  0 for  0 < j < P.  Set  A [0,0]  4 1,  l 4-  0,  q 4-  0. 

Then  write  an  initial  run  on  tape  j,  for  1 < j < P. 

B2.  [Input  complete?]  (At  this  point  tape  q is  empty  and  the  other  tapes  contain 
at  most  one  run  each.)  If  there  is  more  input,  go  on  to  step  B3.  But  if 
the  input  is  exhausted,  rewind  all  tapes  j / q such  that  A[0,j]  is  even: 
then  merge  to  tape  q,  reading  forwards  on  tapes  just  rewound,  and  reading 
backwards  on  the  other  tapes.  This  completes  the  sort,  with  the  output  in 
ascending  order  on  tape  q. 

B3.  [Begin  new  level.]  Set  l 4-  Z + 1,  r 4-  q,  s 4-  0,  and  q 4-  (q  + 1)  mod  T. 

Write  an  initial  run  on  tape  (q  + j)  mod  T,  for  1 < j < T - 2.  (Thus  an 

initial  run  is  written  onto  each  tape  except  tapes  q and  r.)  Set  A [Z,  g]  4 1 

and  A[Z,r]  4 2. 

B4.  [Ready  to  merge?]  If  A[Z-l,g]  ^ s,  go  back  to  step  B3. 
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B5.  [Merge.]  (At  this  point  A[Z-l,g]  = A Ll,jl  = s for  all  j ^ q,  j ^ r.) 
Merge  to  tape  r,  reading  backwards.  (See  the  definition  of  this  operation 

above.)  Then  set  s 4—  s + 1,  Z 4—  Z — 1,  A [Z, r]  4—  s,  and  A [Z,g]  4 1.  Set 

r 4—  (2 q — r)  mod  T.  (In  general,  we  have  r = (g  — 1)  mod  T when  s is  even, 
r = [q  + 1)  mod  T when  s is  odd.) 

B6.  [Is  level  complete?]  If  l = 0,  go  to  B2.  Otherwise  if  A [Z,  j]  = s for  all  j ± q 
and  j ^ r,  go  to  B4.  Otherwise  return  to  B3.  | 

We  can  use  a “recursion  induction”  style  of  proof  to  show  that  this  al- 
gorithm is  valid,  just  as  we  have  done  for  Algorithm  2.3. IT.  Suppose  that 
we  begin  at  step  B3  with  Z = Z0,  q = q0,  s+  — A [Z0 , (go+1)  mod  T]  , and 
s-  = A [Zo,  (go  — 1)  mod  T]  ; and  assume  furthermore  that  either  s+  = 0 or  s_  = 1 
or  s+  = 2 or  s_  = 3 or  • • • . It  is  possible  to  verify  by  induction  that  the  algorithm 
will  eventually  get  to  step  B5  without  changing  rows  0 through  Iq  of  A,  and  with 
l — Zo  + 1,  q = go  ± 1,  r = go,  and  s — s+  or  s_,  where  we  choose  the  + sign  if 
= 0 or  (s+  = 2 and  s_  / 1)  or  (s+  = 4 and  s_  1,  3)  or  • • • , and  we  choose 
the  - sign  if  (s_  = 1 and  s+  / 0)  or  (s_  = 3 and  s+  / 0,  2)  or  • • • . The  proof 
sketched  here  is  not  very  elegant,  but  the  algorithm  has  been  stated  in  a form 
more  suited  to  implementation  than  to  verification. 

Figure  78  shows  the  efficiency  of  Algorithm  B,  in  terms  of  the  average  num- 
ber of  times  each  record  is  merged  as  a function  of  the  number  S of  initial  runs, 
assuming  that  the  initial  runs  are  approximately  equal  in  length.  (Corresponding 
graphs  for  polyphase  and  cascade  sort  have  appeared  in  Figs.  70  and  74.)  A slight 
improvement,  mentioned  in  exercise  3,  has  been  used  in  preparing  this  chart. 

A related  method  called  the  gyrating  sort  was  developed  by  R.  M.  Karp, 
based  on  the  theory  of  preorder  merging  that  we  have  discussed  in  Section  5.4.4; 
see  Combinatorial  Algorithms,  edited  by  Randall  Rustin  (Algorithmics  Press, 
1972),  21-29. 

Reading  forwards.  The  oscillating  sort  pattern  appears  to  require  a read- 
backwards  capability,  since  we  need  to  store  long  runs  somewhere  as  we  merge 
newly  input  short  runs.  However,  M.  A.  Goetz  [Proc.  AFIPS  Spring  Joint 
Comp.  Conf.  25  (1964),  599-607]  has  discovered  a way  to  perform  an  oscillating 
sort  using  only  forward  reading  and  simple  rewinding.  His  method  is  radically 
different  from  the  other  schemes  we  have  seen  in  this  chapter,  in  two  ways: 

a)  Data  is  sometimes  written  at  the  front  of  the  tape,  with  the  understanding 
that  the  existing  data  in  the  middle  of  the  tape  is  not  destroyed. 

b)  All  initial  runs  have  a fixed  maximum  length. 

Condition  (a)  violates  the  first-in-first-out  property  we  have  assumed  to  be 
characteristic  of  forward  reading,  but  it  can  be  implemented  reliably  if  a sufficient 
amount  of  blank  tape  is  left  between  runs  and  if  parity  errors  are  ignored  at 
appropriate  times.  Condition  (b)  tends  to  be  somewhat  incompatible  with  an 
efficient  use  of  replacement  selection. 

Goetz’s  read- forward  oscillating  sort  has  the  somewhat  dubious  distinction 
of  being  one  of  the  first  algorithms  to  be  patented  as  an  algorithm  instead  of  as  a 
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a physical  device  [U.S.  Patent  3380029  (1968)];  between  1968  and  1988,  no  one  in 
the  U.S.  A.  could  legally  use  the  algorithm  in  a program  without  permission  of  the 
patentee.  Bencher’s  read-backward  oscillating  sort  technique  was  patented  by 
IBM  several  years  later.  [Alas,  we  have  reached  the  end  of  the  era  when  the  joy  of 
discovering  a new  algorithm  was  satisfaction  enough!  Fortunately  the  oscillating 
sort  isn’t  especially  good;  let’s  hope  that  community-minded  folks  who  invent 
the  best  algorithms  continue  to  make  their  ideas  freely  available.  Of  course  the 
specter  of  people  keeping  new  techniques  completely  secret  is  far  worse  than  the 
public  appearance  of  algorithms  that  are  proprietary  for  a limited  time.] 

The  central  idea  in  Goetz’s  method  is  to  arrange  things  so  that  each  tape 
begins  with  a run  of  relative  length  1,  followed  by  one  of  relative  length  P,  then 
P2,  etc.  For  example,  when  T = 5 the  sort  begins  as  follows,  using  to 
indicate  the  current  position  of  the  read-write  head  on  each  tape: 


Operation 

Tl 

T2  T3  T4 

T5 

“Cost’ 

Remarks 

Phase 

1 

Distribute 

■Ai 

.Ai  .A\  .A\ 

Ax. 

5 

[T5  not  rewound] 

Phase 

2 

Merge 

#i- 

*r  *i- 

Ax  A4. 

4 

[Now  rewind  all] 

Phase 

3 

Distribute 

■Ax 

.A\  .A\  A\. 

■Ax  A4 

4 

[T4  not  rewound] 

Phase 

4 

Merge 

Ax- 

-^l  ^4- 

Ax-A4 

4 

[Now  rewind  all] 

Phase 

5 

Distribute 

■ Ax 

•A\  A\.  .A\  A4 

■Ax  A4 

4 

[T3  not  rewound] 

Phase 

6 

Merge 

Ax- 

A\  A4. 

Ax-^4 

4 

[Now  rewind  all] 

Phase 

7 

Distribute 

■Ax 

A\.  ,A\  A4  .A\  A4 

■ Ax  A4 

4 

[T2  not  rewound] 

Phase 

8 

Merge 

Ax- 

A\  A4.  )^1.A4  )^1.A4 

Ax-A4 

4 

[Now  rewind  all] 

Phase 

9 

Distribute 

Ax. 

• Ai  A4  .Ai  A4  .A\  A4 

■ Ax  A4 

4 

[Tl  not  rewound] 

Phase 

10 

Merge 

A1A4. 

$vA4 

Ax-A4 

4 

[No  rewinding] 

Phase  11 

Merge  AxA4Ai6. 

*1  K *1  *4-  *1  *4- 

^X  *4' 

16 

[Now  rewind  all] 

And  so  on.  During  Phase  1. 

Tl  was  rewinding  while  T2 

was  receiving  its  input, 

then  T2  was  rewinding  while  T3  was  receiving  input,  etc.  Eventually,  when  the 
input  is  exhausted,  dummy  runs  will  start  to  appear,  and  we  will  sometimes 
need  to  imagine  that  they  were  written  explicitly  on  the  tape  at  full  length.  For 
example,  if  S = 18,  the  Ai’s  on  T4  and  T5  would  be  dummies  during  Phase  9; 
we  would  have  to  skip  forwards  on  T4  and  T5  while  merging  from  T2  and  T3 
to  T1  during  Phase  10,  because  we  have  to  get  to  the  A4’s  on  T4  and  T5  in 
preparation  for  Phase  11.  On  the  other  hand,  the  dummy  Ax  on  T1  need  not 
appear  explicitly.  Thus  the  “endgame”  is  a bit  tricky. 

Another  example  of  this  method  appears  in  the  next  section. 

EXERCISES 

1.  [22]  The  text  illustrates  Sobel’s  original  oscillating  sort  for  T = 5 and  S = 16. 
Give  a precise  specification  of  an  algorithm  that  generalizes  the  procedure,  sorting 
S = PL  initial  runs  onT  = P+  l>3  tapes.  Strive  for  simplicity. 

2.  [24]  If  S = 6 in  Sobel’s  original  method,  we  could  pretend  that  S = 16  and  that 
10  dummy  runs  were  present.  Then  Phase  3 in  the  text’s  example  would  put  dummy 
runs  A0  on  T4  and  T5;  Phase  4 would  merge  the  Ax’s  on  T2  and  T3  into  a D2  on  Tl; 
Phases  5-8  would  do  nothing;  and  Phase  9 would  produce  A6  on  T4.  It  would  be  better 
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T = 3 


T = 4 

T = 5 
T = 6 
T = 8 
T-  10 


1 2 5 10  20  50  100  200  500  1000  2000  5000 

Initial  runs,  S 

Fig.  78.  Efficiency  of  oscillating  sort,  using  the  technique  of  Algorithm  B and  exercise  3. 


to  rewind  T2  and  T3  just  after  Phase  3,  then  to  produce  As  immediately  on  T4  by 
three-way  merging. 

Show  how  to  modify  the  algorithm  of  exercise  1,  so  that  an  improved  ending  like 
this  is  obtained  when  S is  not  a perfect  power  of  P. 

► 3.  [29]  Prepare  a chart  showing  the  behavior  of  Algorithm  B when  T = 3,  assuming 
that  there  are  nine  initial  runs.  Show  that  the  procedure  is  obviously  inefficient  in  one 
place,  and  prescribe  corrections  to  Algorithm  B that  will  remedy  the  situation. 

4.  [21]  Step  B3  sets  kll,q]  and  A[l,r]  to  negative  values.  Show  that  one  of  these 
two  operations  is  always  superfluous,  since  the  corresponding  A table  entry  is  never 
looked  at. 

5.  [M25]  Let  S be  the  number  of  initial  runs  present  in  the  input  to  Algorithm  B. 
Which  values  of  S require  no  rewinding  in  step  B2? 

*5.4.6.  Practical  Considerations  for  Tape  Merging 

Now  comes  the  nitty-gritty:  We  have  discussed  the  various  families  of  merge 
patterns,  so  it  is  time  to  see  how  they  actually  apply  to  real  configurations  of 
computers  and  magnetic  tapes,  and  to  compare  them  in  a meaningful  way.  Our 
study  of  internal  sorting  showed  that  we  can’t  adequately  judge  the  efficiency  of  a 
sorting  method  merely  by  counting  the  number  of  comparisons  it  performs;  sim- 
ilarly we  can’t  properly  evaluate  an  external  sorting  method  by  simply  knowing 
the  number  of  passes  it  makes  over  the  data. 
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In  this  section  we  shall  discuss  the  characteristics  of  typical  tape  units,  and 
the  way  they  affect  initial  distribution  and  merging.  In  particular  we  shall  study 
some  schemes  for  buffer  allocation,  and  the  corresponding  effects  on  running 
time.  We  also  shall  consider  briefly  the  construction  of  sort  generator  programs. 

How  tape  works.  Different  manufacturers  have  provided  tape  units  with  widely 
varying  characteristics.  For  convenience,  we  shall  define  a hypothetical  MIXT  tape 
unit,  which  is  reasonably  typical  of  the  equipment  that  was  being  manufactured 
at  the  time  this  book  was  first  written.  MIXT  reads  and  writes  800  characters  per 
inch  of  tape,  at  a rate  of  75  inches  per  second.  This  means  that  one  character 
is  read  or  written  every  ^ ms,  or  16§  microseconds,  when  the  tape  is  active. 
Actual  tape  units  that  were  available  in  1970  had  densities  ranging  from  200  to 
1600  characters  per  inch,  and  tape  speeds  ranging  from  37|  to  150  inches  per 
second,  so  their  effective  speed  varied  from  1/8  to  4 times  as  fast  as  MIXT. 

Of  course,  we  observed  near  the  beginning  of  Section  5.4  that  magnetic  tapes 
in  general  are  now  pretty  much  obsolete.  But  many  lessons  were  learned  during 
the  decades  when  tape  sorting  was  of  major  importance,  and  those  lessons  are 
still  valuable.  Thus  our  main  concern  here  is  not  to  obtain  particular  answers;  it 
is  to  learn  how  to  combine  theory  and  practice  in  a reasonable  way.  Methodology 
is  much  more  important  than  phenomenology,  because  the  principles  of  problem 
solving  remain  useful  despite  technological  changes.  Readers  will  benefit  most 
from  this  material  by  transplanting  themselves  temporarily  into  the  mindset  of 
the  1970s.  Let  us  therefore  pretend  that  we  still  live  in  that  bygone  era. 

One  of  the  important  considerations  to  keep  in  mind,  as  we  adopt  the 
perspective  of  the  early  days,  is  the  fact  that  individual  tapes  have  a strictly 
limited  capacity.  Each  reel  contains  2400  feet  of  tape  or  less;  hence  there  is 
room  for  at  most  23,000,000  or  so  characters  per  reel  of  MIXT  tape,  and  it  takes 
about  23000000/3600000  « 6.4  minutes  to  read  them  all.  If  larger  files  must  be 
sorted,  it  is  generally  best  to  sort  one  reelful  at  a time,  and  then  to  merge  the 
individually  sorted  reels,  in  order  to  avoid  excessive  tape  handling.  This  means 
that  the  number  of  initial  runs,  S,  actually  present  in  the  merge  patterns  we  have 
been  studying  is  never  extremely  large.  We  will  never  find  5 > 5000,  even  with  a 
very  small  internal  memory  that  produces  initial  runs  only  5000  characters  long. 
Consequently  the  formulas  that  give  asymptotic  efficiency  of  the  algorithms  as 
S -4  oo  are  primarily  of  academic  interest. 

Data  appears  on  tape  in  blocks  (Fig.  79),  and  each  read/write  instruction 
transmits  a single  block.  Tape  blocks  are  often  called  “records,”  but  we  shall 
avoid  that  terminology  because  it  conflicts  with  the  fact  that  we  are  sorting  a 
file  of  “records”  in  another  sense.  Such  a distinction  was  unnecessary  on  many 
of  the  early  sorting  programs  written  during  the  1950s,  since  one  record  was 
written  per  block;  but  we  shall  see  that  it  is  usually  advantageous  to  have  quite 
a few  records  in  every  block  on  the  tape. 

An  interblock  gap , 480  character  positions  long,  appears  between  adjacent 
blocks,  in  order  to  allow  the  tape  to  stop  and  to  start  between  individual  read 
or  write  commands.  The  effect  of  interblock  gaps  is  to  decrease  the  number  of 
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characters  per  reel  of  tape,  depending  on  the  number  of  characters  per  block  (see 
Fig.  80);  and  the  average  number  of  characters  transmitted  per  second  decreases 
in  the  same  way,  since  tape  moves  at  a fairly  constant  speed. 


01 I I I 1 I— 

0 1000  2000  3000  4000  5000 


Characters  per  block 

Fig.  80.  The  number  of  characters  per  reel  of  MIXT  tape,  as  a function  of  the  block  size. 

Many  old-fashioned  computers  had  fixed  block  sizes  that  were  rather  small; 
their  design  was  reflected  in  the  MIX  computer  as  defined  in  Chapter  1,  which 
always  reads  and  writes  100-word  blocks.  But  Mix’s  convention  corresponds  to 
about  500  characters  per  block,  and  480  characters  per  gap,  hence  almost  half 
the  tape  is  wasted!  Most  machines  of  the  1970s  therefore  allowed  the  block  size 
to  be  variable;  we  shall  discuss  the  choice  of  appropriate  block  sizes  below. 

At  the  end  of  a read  or  write  operation,  the  tape  unit  “coasts”  at  full  speed 
over  the  first  66  characters  (or  so)  of  the  gap.  If  the  next  operation  for  the  same 
tape  is  initiated  during  this  time,  the  tape  motion  continues  without  interruption. 
But  if  the  next  operation  doesn’t  come  soon  enough,  the  tape  will  stop  and  it 
will  also  require  some  time  to  accelerate  to  full  speed  on  the  next  operation.  The 
combined  stop/start  time  delay  is  5 ms,  2 for  the  stop  and  3 for  the  start  (see 
Fig.  81).  Thus  if  we  just  miss  the  chance  to  have  continuous  full-speed  reading, 
the  effect  on  running  time  is  essentially  the  same  as  if  there  were  780  characters 
instead  of  480  in  the  interblock  gap. 

Now  let  us  consider  the  operation  of  rewinding.  Unfortunately,  the  exact 
time  needed  to  rewind  over  a given  number  n of  characters  is  not  easy  to 
characterize.  On  some  machines  there  is  a high-speed  rewind  that  applies  only 
when  n is  greater  than  5 million  or  so;  for  smaller  values  of  n,  rewinding  goes  at 
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Time  from  completion  of  previous  operation  to  initiation 
of  next  command  to  tape  controller  (ms) 

Fig.  81.  How  to  compute  the  stop/start  delay  time.  (This  gets  added  to  the  time  used 
for  reading  or  writing  the  blocks  and  the  gaps.) 


normal  read/write  speed.  On  other  machines  a special  motor  is  used  to  control 
all  of  the  rewind  operations;  it  gradually  accelerates  the  tape  reel  to  a certain 
number  of  revolutions  per  minute,  then  puts  on  the  brakes  when  it  is  time  to 
stop,  and  the  actual  tape  speed  varies  with  the  fullness  of  the  reel.  For  simplicity, 
we  shall  assume  that  MIXT  requires  max(30,  n/150)  ms  to  rewind  over  n character 
positions  (including  gaps),  roughly  two-fifths  as  long  as  it  took  to  write  them. 
This  is  a reasonably  good  approximation  to  the  behavior  of  many  actual  tape 
units,  where  the  ratio  of  read/write  time  to  rewind  time  is  generally  between  2 
and  3,  but  it  does  not  adequately  model  the  effect  of  combined  low-speed  and 
high-speed  rewind  that  is  present  on  many  other  machines.  (See  Fig.  82.) 

Initial  loading  and/or  rewinding  will  position  a tape  at  “load  point,”  and  an 
extra  110  ms  are  necessary  for  any  read  or  write  operation  initiated  at  load  point. 
When  the  tape  is  not  at  load  point,  it  may  be  read  backwards;  an  extra  32  ms  is 
added  to  the  time  of  any  backward  operation  following  a forward  operation  or 
any  forward  operation  following  a backward  one. 


0 5,000,000  15,000,000  23,000,000 


Number  of  characters  from  load  point 


Fig.  82.  Approximate  running  time  for  two  commonly  used  rewind  techniques. 
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Merging  revisited.  Let  us  now  look  again  at  the  process  of  P- way  merging, 
with  an  emphasis  on  input  and  output  activities,  assuming  that  P+1  tape  units 
are  being  used  for  the  input  files  and  the  output  file.  Our  goal  is  to  overlap 
the  input /output  operations  as  much  as  possible  with  each  other  and  with  the 
computations  of  the  program,  so  that  the  overall  merging  time  is  minimized. 

It  is  instructive  to  consider  the  following  special  case,  in  which  serious 
restrictions  are  placed  on  the  amount  of  simultaneity  possible.  Suppose  that 

a)  at  most  one  tape  may  be  written  on  at  any  one  time; 

b)  at  most  one  tape  may  be  read  from  at  any  one  time; 

c)  reading,  writing,  and  computing  may  take  place  simultaneously  only  when 
the  read  and  write  operations  have  been  initiated  simultaneously. 

It  turns  out  that  a system  of  2 P input  buffers  and  2 output  buffers  is  sufficient 
to  keep  the  tape  moving  at  essentially  its  maximum  speed,  even  though  these 
three  restrictions  are  imposed,  unless  the  computer  is  unusually  slow.  Note  that 
condition  (a)  is  not  really  a restriction,  since  there  is  only  one  output  tape. 
Furthermore  the  amount  of  input  is  equal  to  the  amount  of  output,  so  there  is 
only  one  tape  being  read,  on  the  average,  at  any  given  time;  if  condition  (b)  is 
not  satisfied,  there  will  necessarily  be  periods  when  no  input  at  all  is  occurring. 
Thus  we  can  minimize  the  merging  time  if  we  keep  the  output  tape  busy. 

An  important  technique  called  forecasting  leads  to  the  desired  effect.  While 
we  are  doing  a P- way  merge,  we  generally  have  P current  input  buffers,  which 
are  being  used  as  the  source  of  data;  some  of  them  are  more  full  than  others, 
depending  on  how  much  of  their  data  has  already  been  scanned.  If  all  of  them 
become  empty  at  about  the  same  time,  we  will  need  to  do  a lot  of  reading  before 
we  can  proceed  further,  unless  we  have  foreseen  this  eventuality  in  advance. 
Fortunately  it  is  always  possible  to  tell  which  buffer  will  empty  first,  by  simply 
looking  at  the  last  record  in  each  buffer.  The  buffer  whose  last  record  has  the 
smallest  key  will  always  be  the  first  one  empty,  regardless  of  the  values  of  any 
other  keys;  so  we  always  know  which  file  should  be  the  source  of  our  next  input 
command.  The  following  algorithm  spells  out  this  principle  in  detail. 

Algorithm  F ( Forecasting  with  floating  buffers).  This  algorithm  controls  the 
buffering  during  a P- way  merge  of  long  input  files,  for  P > 2.  Assume  that  the 
input  tapes  and  files  are  numbered  1,2,  ...,P.  The  algorithm  uses  2 P input 
buffers  I [1] , . . . , I [2P] ; two  output  buffers  0 [0]  and  0 [1] ; and  the  following 
auxiliary  tables: 

A [j]  , 1 <j<  2 P:  0 if  I [ j]  is  available  for  input,  1 otherwise. 

BH,  1 <i  < P:  Index  of  the  buffer  holding  the  last  block  read  so  far  from  file  i. 
C [i] . 1 <i<  P:  Index  of  the  buffer  currently  being  used  for  the  input  from  file  i. 
L [*],  1 <i<  P:  The  last  key  read  so  far  from  file  i. 

S [/]  , 1 < j < 2 P:  Index  of  the  buffer  to  use  when  I [ j]  becomes  empty. 

The  algorithm  described  here  does  not  terminate;  an  appropriate  way  to  shut  it 
off  is  discussed  below. 
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Fig.  83.  Forecasting  with  floating  buffers. 


FI.  [Initialize.]  Read  the  first  block  from  tape  i into  buffer  I [j] , set  A [/]  1. 

A [P+i]  -<—0,  BM  -t-  i , CM  i,  and  set  LM  to  the  key  of  the  final 
record  in  buffer  I M , for  1 < i < P.  Then  find  m such  that  L[m]  = 
min{L  [1] , . . . , L [P]  };  and  set  t 0,  k 4-  P + 1.  Begin  to  read  from  tape 
m into  buffer  I [fc]  . 

F2.  [Merge.]  Merge  records  from  buffers  I [C  [1]  ] , . . . , I [C  [P]  ] to  0 [£]  , until 
0 [t]  is  full.  If  during  this  process  an  input  buffer,  say  I [C  M ] , becomes 
empty  and  0 M is  not  yet  full,  set  A [C  [/]  ] -< — 0,  C [?]  «—  S [C  [z]  ] , and 
continue  to  merge. 

F3.  [I/O  complete.]  Wait  until  the  previous  read  (or  read/write)  operation  is 
complete.  Then  set  A[fc]  <—  1,  S [B  [m]  ] <—  k,  B[m]  k , and  set  L[m]  to 
the  key  of  the  final  record  in  I [fc]  . 

F4.  [Forecast.]  Find  m such  that  L [m]  = min{L  [1]  , . . . , L [P]  },  and  find  k such 
that  A [fc]  = 0. 

F5.  [Read/write.]  Begin  to  read  from  tape  m into  buffer  I [A:] , and  to  write  from 
buffer  0[f]  onto  the  output  tape.  Then  set  t 1 — t and  return  to  F2.  | 

The  example  in  Fig.  84  shows  how  forecasting  works  when  P — 2,  assuming 
that  each  block  on  tape  contains  only  two  records.  The  input  buffer  contents  are 
illustrated  each  time  we  get  to  the  beginning  of  step  F2.  Algorithm  F essentially 
forms  P queues  of  buffers , with  C H pointing  to  the  front  and  B [?']  to  the  rear 
of  the  ith  queue,  and  with  S [ j]  pointing  to  the  successor  of  buffer  I [j]  ; these 
pointers  are  shown  as  arrows  in  Fig.  84.  Line  1 illustrates  the  state  of  affairs 
after  initialization:  There  is  one  buffer  for  each  input  file,  and  another  block  is 
being  read  from  File  1 (since  03  < 05).  Line  2 shows  the  status  of  things  after  the 
first  block  has  been  merged:  We  are  outputting  a block  containing  | 01  02  | , and 
inputting  the  next  block  from  File  2 (since  05  < 09).  Note  that  in  line  3,  three 
of  the  four  input  buffers  are  essentially  committed  to  File  2,  since  we  are  reading 
from  that  file  and  we  already  have  a full  buffer  and  a partly  full  buffer  in  its 
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Fig.  84.  Buffer  queuing,  according  to  Algorithm  F. 


queue.  This  floating-buffer  arrangement  is  an  important  feature  of  Algorithm  F, 
since  we  would  be  unable  to  proceed  in  line  4 if  we  had  chosen  File  1 instead  of 
File  2 for  the  input  on  line  3. 

In  order  to  prove  that  Algorithm  F is  valid,  we  must  show  two  things: 

i)  There  is  always  an  input  buffer  available  (that  is,  we  can  always  find  a k in 
step  F4). 

ii)  If  an  input  buffer  is  exhausted  while  merging,  its  successor  is  already  present 
in  memory  (that  is,  S[C [■/']]  is  meaningful  in  step  F2). 

Suppose  (i)  is  false,  so  that  all  buffers  are  unavailable  at  some  point  when  we 
reach  step  F4.  Each  time  we  get  to  that  step,  the  total  amount  of  unprocessed 
data  among  all  the  buffers  is  exactly  P bufferloads,  just  enough  data  to  fill 
P buffers  if  it  were  redistributed,  since  we  are  inputting  and  outputting  data 
at  the  same  rate.  Some  of  the  buffers  are  only  partially  full;  but  at  most  one 
buffer  for  each  file  is  partially  full,  so  at  most  P buffers  are  in  that  condition.  By 
hypothesis  all  2 P of  the  buffers  are  unavailable;  therefore  at  least  P of  them  must 
be  completely  full.  This  can  happen  only  if  P are  full  and  P are  empty,  otherwise 
we  would  have  too  much  data.  But  at  most  one  buffer  can  be  unavailable  and 
empty  at  any  one  time;  hence  (i)  cannot  be  false. 

Suppose  (ii)  is  false,  so  that  we  have  no  unprocessed  records  in  memory, 
for  some  file,  but  the  current  output  buffer  is  not  yet  full.  By  the  principle  of 
forecasting,  we  must  have  no  more  than  one  block  of  data  for  each  of  the  other 
files,  since  we  do  not  read  in  a block  for  a file  unless  that  block  will  be  needed 
before  the  buffers  on  any  other  file  are  exhausted.  Therefore  the  total  number  of 
unprocessed  records  amounts  to  at  most  P —1  blocks;  adding  the  unfilled  output 
buffer  leads  to  less  than  P bufferloads  of  data  in  memory,  a contradiction. 
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This  argument  establishes  the  validity  of  Algorithm  F;  and  it  also  indicates 
the  possibility  of  pathological  circumstances  under  which  the  algorithm  just 
barely  avoids  disaster.  An  important  subtlety  that  we  have  not  mentioned, 
regarding  the  possibility  of  equal  keys,  is  discussed  in  exercise  5.  See  also 
exercise  4,  which  considers  the  case  P — 1. 

One  way  to  terminate  Algorithm  F gracefully  is  to  set  L [m]  to  oo  in  step  F3 
if  the  block  just  read  is  the  last  of  a run.  (It  is  customary  to  indicate  the  end  of 
a run  in  some  special  way.)  After  all  of  the  data  on  all  of  the  files  has  been  read, 
we  will  eventually  find  all  of  the  L's  equal  to  oo  in  step  F4;  then  it  is  usually 
possible  to  begin  reading  the  first  blocks  of  the  next  run  on  each  file,  beginning 
initialization  of  the  next  merge  phase  as  the  final  P + 1 blocks  are  output. 

Thus  we  can  keep  the  output  tape  going  at  essentially  full  speed,  without 
reading  more  than  one  tape  at  a time.  An  exception  to  this  rule  occurs  in  step  FI . 
where  it  would  be  beneficial  to  read  several  tapes  at  once  in  order  to  get  things 
going  in  the  beginning;  but  step  FI  can  usually  be  arranged  to  overlap  with  the 
preceding  part  of  the  computation. 

The  idea  of  looking  at  the  last  record  in  each  block,  to  predict  which  buffer 
will  empty  first,  was  discovered  in  1953  by  F.  E.  Holberton.  The  technique  was 
first  published  by  E.  H.  Friend  [JACM  3 (1956),  144-145,  165],  His  rather 
complicated  algorithm  used  3 P input  buffers,  with  three  dedicated  to  each 
input  file;  Algorithm  F improves  the  situation  by  making  use  of  floating  buffers, 
allowing  any  single  file  to  claim  as  many  as  P + 1 input  buffers  at  once,  yet 
never  needing  more  than  2 P in  all.  A discussion  of  merging  with  fewer  than  2 P 
input  buffers  appears  at  the  end  of  this  section.  Some  interesting  improvements 
to  Algorithm  F are  discussed  in  Section  5.4.9. 

Comparative  behavior  of  merge  patterns.  Let  us  now  use  what  we  know 
about  tapes  and  merging  to  compare  the  effectiveness  of  the  various  merge 
patterns  that  we  have  studied  in  Sections  5.4.2  through  5.4.5.  It  is  very  in- 
structive to  work  out  the  details  when  each  method  is  applied  to  the  same  task. 
Consider  therefore  the  problem  of  sorting  a file  whose  records  each  contain  100 
characters,  when  there  are  100,000  character  positions  of  memory  available  for 
data  storage  not  counting  the  space  needed  for  the  program  and  its  auxiliary 
vaiiables,  or  the  space  occupied  by  links  in  a selection  tree.  (Remember  that 
we  are  pretending  to  live  in  the  days  when  memories  were  small.)  The  input 
appears  in  random  order  on  tape,  in  blocks  of  5000  characters  each,  and  the 
output  is  to  appear  in  the  same  format.  There  are  five  scratch  tapes  to  work 
with,  in  addition  to  the  unit  containing  the  input  tape. 

The  total  number  of  records  to  be  sorted  is  100,000,  but  this  information  is 
not  known  in  advance  to  the  sorting  algorithm. 

The  foldout  illustration  in  Chart  A summarizes  the  actions  that  transpire 
when  ten  different  merging  schemes  are  applied  to  this  data.  The  best  way  to  look 
at  this  important  illustration  is  to  imagine  that  you  are  actually  watching  the 
sort  take  place:  Scan  each  line  slowly  from  left  to  right,  pretending  that  you  can 
actually  see  six  tapes  reading,  writing,  rewinding,  and/or  reading  backwards,  as 
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indicated  on  the  diagram.  During  a P- way  merge  the  input  tapes  will  be  moving 
only  1/P  times  as  often  as  the  output  tape.  When  the  original  input  tape  has 
been  completely  read  (and  rewound  “with  lock”),  Chart  A assumes  that  a skilled 
computer  operator  dismounts  it  and  replaces  it  with  a scratch  tape,  in  just  30 
seconds.  In  examples  2,  3,  and  4 this  is  “critical  path  time”  when  the  computer 
is  idly  waiting  for  the  operator  to  finish;  but  in  the  remaining  examples,  the 
dismount-reload  operation  is  overlapped  by  other  processing. 

Example  1.  Read- forward  balanced  merge.  Let’s  review  the  specifications 
of  the  problem:  The  records  are  100  characters  long,  there  is  enough  internal 
memory  to  hold  1000  records  at  a time,  and  each  block  on  the  input  tape  contains 
5000  characters  (50  records).  There  are  100,000  records  (=  10,000,000  characters 
= 2000  blocks)  in  all. 

We  are  free  to  choose  the  block  size  for  intermediate  files.  A six-tape 
balanced  merge  uses  three-way  merging,  so  the  technique  of  Algorithm  F calls  for 
8 buffers;  we  may  therefore  use  blocks  containing  1000/8  = 125  records  (=  12500 
characters)  each. 

The  initial  distribution  pass  can  make  use  of  replacement  selection  (Algo- 
rithm 5.4. 1R),  and  in  order  to  keep  the  tapes  running  smoothly  we  may  use  two 
input  buffers  of  50  records  each,  plus  two  output  buffers  of  125  records  each. 
This  leaves  room  for  650  records  in  the  replacement  selection  tree.  Most  of  the 
initial  runs  will  therefore  be  about  1300  records  long  (10  or  11  blocks);  it  turns 
out  that  78  initial  runs  are  produced  in  Chart  A,  the  last  one  being  rather  short. 

The  first  merge  pass  indicated  shows  nine  runs  merged  to  tape  4,  instead  of 
alternating  between  tapes  4,  5,  and  6.  This  makes  it  possible  to  do  useful  work 
while  the  computer  operator  is  loading  a scratch  tape  onto  unit  6;  since  the  total 
number  5 of  runs  is  known  once  the  initial  distribution  has  been  completed,  the 
algorithm  knows  that  [5/9]  runs  should  be  merged  to  tape  4,  then  [(5  — 3)/9] 
to  tape  5,  then  [(5  — 6)/9]  to  tape  6. 

The  entire  sorting  procedure  for  this  example  can  be  summarized  in  the 
following  way,  using  the  notation  introduced  in  Section  5.4.2: 


^26  -^26  -^26 

93  93  926x 

78 1 


39  39 

271  271 


38 

241 


Example  2.  Read-forward  polyphase  merge.  The  second  example  in 
Chart  A carries  out  the  polyphase  merge,  according  to  Algorithm  5. 4. 2D.  In 
this  case  we  do  five-way  merging,  so  the  memory  is  split  into  12  buffers  of  83 
records  each.  During  the  initial  replacement  selection  we  have  two  50-record 
input  buffers  and  two  83-record  output  buffers,  leaving  734  records  in  the  tree; 
so  the  initial  runs  this  time  are  about  1468  records  long  (17  or  18  blocks).  The 
situation  illustrated  shows  that  5 = 70  initial  runs  were  obtained,  the  last  two 
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actually  being  only  four  blocks  and  one  block  long,  respectively.  The  merge 
pattern  can  be  summarized  thus: 


013118  013117 
2^15  ^14 

l7  l6 

l3  l2 

l1 

341 

701 


0i3  : 15  Q12J12 

l12  l8 

l4 

84 

le1^1  82 

191  81 


0818 

08142153 
48  142x53 

44  2153 

42  52 

41  51 


Curiously,  polyphase  actually  took  about  25  seconds  longer  than  the  far  less 
sophisticated  balanced  merge!  There  are  two  main  reasons  for  this: 

1)  Balanced  merge  was  particularly  lucky  in  this  case,  since  S = 78  is  just 
less  than  a perfect  power  of  3.  If  82  initial  runs  had  been  produced,  the  balanced 
merge  would  have  needed  an  extra  pass. 

2)  Polyphase  merge  wasted  30  seconds  while  the  input  tape  was  being 
changed,  and  a total  of  more  than  5 minutes  went  by  while  it  was  waiting  for 
rewind  operations  to  be  completed.  By  contrast  the  balanced  merge  needed 
comparatively  little  rewind  time.  In  the  second  phase  of  the  polyphase  merge, 
13  seconds  were  saved  because  the  8 dummy  runs  on  tape  6 could  be  assumed 
present  even  while  that  tape  was  rewinding;  but  no  other  rewind  overlap  oc- 
curred. Therefore  polyphase  lost  out  even  though  it  required  significantly  less 
read/write  time. 


Example  3.  Read-forward  cascade 

preceding,  but  using  Algorithm  5.4.3C. 

merge.  This  case  is  analogous  to  the 
The  merging  may  be  summarized  thus: 

I14 

1 15 

112 

ll4  i 15  

l5 

l9 

— 

l14  jis  132336 

53 

5362 

l1  22 

— 

121 

61 

181  181  161 

701 

— 



__ 

(Remember  to  watch  each  of  these  examples  in  action,  by  scanning  Chart  A in 
the  foldout  illustration.) 


Example  4.  Tape-splitting  polyphase  merge.  This  procedure,  described  at 
the  end  of  Section  5.4.2,  allows  most  of  the  rewind  time  to  be  overlapped.  It  uses 
four-way  merging,  so  we  divide  the  memory  into  ten  100-record  buffers;  there  are 
700  records  in  the  replacement  selection  tree,  so  it  turns  out  that  72  initial  runs 
are  formed.  The  last  run,  again,  is  very  short.  A distribution  scheme  analogous 
to  Algorithm  5. 4. 2D  has  been  used,  followed  by  a simple  but  somewhat  ad  hoc 
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method  of  placing  dummy  runs: 


l21 

I19 

I15 

I8 

— 

0219 

02117 

02 1 15 

02lu 

0214 

— 

021944 

l13 

l11 

l7 

— 

0244 

021944 

flO 

l8 

l4 

— 

02443241 

1s44 

l6 

l4 

— 

44 

02443244 

1444 

l5 

l3 

— 

4434 

01443241 

1344 

l2 

— 

3472 

4431 

42324i 

44 

l1 

— 

433x 

413241 

43 

— 

131 

3172131 

423x 

3244 

42 

— 

13H41 

7213x 

4*3! 

41 

181 

lS1^1 

71131 

31 

41 

— 

181 

141 

131 

— 

— 

271 

— 

— 

— 

721 

— 

— 

This  turns  out  to  give  the  best  running  time  of  all  the  examples  in  Chart  A that 
do  not  read  backwards.  Since  S will  never  be  very  large,  it  would  be  possible  to 
develop  a more  complicated  algorithm  that  places  dummy  runs  in  an  even  better 
way;  see  Eq.  5.4.2-(26). 

Example  5.  Cascade  merge  with  rewind  overlap.  This  procedure  runs 
almost  as  fast  as  the  previous  example,  although  the  algorithm  governing  it  is 
much  simpler.  We  simply  use  the  cascade  sort  method  as  in  Algorithm  5.4.3C 
for  the  initial  distribution,  but  with  T = 5 instead  of  T — 6.  Then  each  phase 
of  each  “cascade”  staggers  the  tapes  so  that  we  ordinarily  don’t  write  on  a tape 
until  after  it  has  had  a chance  to  be  rewound.  The  pattern,  very  briefly,  is 


I21 

^22 

I19 

I10 

— 

— 

l4 

i7 

— 

— 

122235 

4io 

72 

— 

83 

7282 

41 

— 

261 

81 

22 1 

161 

721 

— 

— 

— 

— 

— 

Example  6.  Read-backward  balanced  merge.  This  is  like  example  1 but 
with  all  the  rewinding  eliminated: 


A26 

A26 

A26 

— 

— 

— 

— 

— 

— 

Dl 

D% 

Dl 

A 2 A 1 
^-9  ^-6 

— 

— 

<— 

— 

— 

D\ 4 

Db 

Dlf 

^78 

— 

— 

— 
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Since  there  was  comparatively  little  rewinding  in  example  1,  this  scheme  is  not  a 
great  deal  better  than  the  read-forward  case.  In  fact,  it  turns  out  to  be  slightly 
slower  than  tape-splitting  polyphase,  in  spite  of  the  fortunate  value  S — 78. 

Example  7.  Read-backward  polyphase  merge.  In  this  example  only  five  of 
the  six  tapes  are  used,  in  order  to  eliminate  the  time  for  rewinding  and  changing 
the  input  tape.  Thus,  the  merging  is  only  four-way,  and  the  buffer  allocation 
is  like  that  in  examples  4 and  5.  A distribution  like  Algorithm  5. 4. 2D  is  used, 
but  with  alternating  directions  of  runs,  and  with  tape  1 fixed  as  the  final  output 
tape.  First  an  ascending  run  is  written  on  tape  1;  then  descending  runs  on  tapes 
2,  3,  4;  then  ascending  runs  on  2,  3,  4;  then  descending  on  1,  2,  3;  etc.  Each  time 
we  switch  direction,  replacement  selection  usually  produces  a shorter  run,  so  it 
turns  out  that  77  initial  runs  are  formed  instead  of  the  72  in  examples  4 and  5. 

This  procedure  results  in  a distribution  of  (22,  21,  19,  15)  runs,  and  the  next 
perfect  distribution  is  (29,  56,  52,  44).  Exercise  5.4.4  5 shows  how  to  generate 
strings  of  merge  numbers  that  can  be  used  to  place  dummy  runs  in  optimum 
positions;  such  a procedure  is  feasible  in  practice  because  the  finiteness  of  a tape 
reel  ensures  that  S is  never  too  large.  Therefore  the  example  in  Chart  A has 
been  constructed  using  such  a method  for  dummy  run  placement  (see  exercise  7). 
This  turns  out  to  be  the  fastest  of  all  the  examples  illustrated. 

Example  8.  Read-backward  cascade  merge.  As  in  example  7,  only  five 
tapes  are  used  here.  This  procedure  follows  Algorithm  5.4.3C,  using  rewind  and 
forward  read  to  avoid  one-way  merging  (since  rewinding  is  more  than  twice  as 
fast  as  reading  on  MIXT  units).  Distribution  is  therefore  the  same  as  in  example  5. 
The  pattern  may  be  summarized  briefly  as  follows,  using  | to  denote  rewinding: 


Aj1 

A? 

A}9 

A\° 

-*«*' 

Ail 

All 

— 

D\D\D\ 

D\° 

C4t- 

oo 

Ai 

- 

D\l 

D\7 

AqI 

D25 

D21 

A 72 

— 

— 

— 

— 

Example  9.  Read-backward  oscillating  sort.  Oscillating  sort  with  T — 5 
(Algorithm  5.4.5B)  can  use  buffer  allocation  as  in  examples  4,  5,  7,  and  8,  since 
it  does  four-way  merging.  However,  replacement  selection  does  not  behave  in 
the  same  way,  since  a run  of  length  700  (not  1400  or  so)  is  output  just  before 
entering  each  merge  phase,  in  order  to  clear  the  internal  memory.  Consequently 
85  runs  are  produced  in  this  example,  instead  of  72.  Some  of  the  key  steps  in 
the  process  are 
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D 4 D 4 

D4D4 

D4  D4 

Da 

d4 

Da 

Da 

- 

Axe 

d4 

AiqDa  Da 

AiqDa 

Ai6-D4j4i  ^4i6 

D4 

A\qD  4D  4 

AasDaDa 

Ait 

> Da  Ax6 

— 

AiqDa 

AxqDa 

A 

16  ^4l6^4l3 

— 

A\qDa 
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^16^4  ^16^13 
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AxeA4 

j4i6^44  "4i6^4i3 

D37 

— 

^164- 

Axb-I  Axei 

— 
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— 

- 

— 

Example  10.  Read-forward  oscillating  sort.  In  the  final  example,  replace- 

ment  selection  is  not  used  because  all  initial  runs 

must  be  the  same  length. 

Therefore  full  core  loads  of  1000  records  are 

sorted  internally  whenever  an  initial 

run  is  required;  this  makes  S = 

100.  Some  key  steps 

in  the  process  are 

Ax 

Ax 
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Ax 

Ax 
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Aa 
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— 
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— 

— 

— 
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^4 100 

— 

— 

— 

— 

This  routine  turns  out  to  be  slowest  of  all,  partly  because  it  does  not  use 
replacement  selection,  but  mostly  because  of  its  rather  awkward  ending  (a  two- 
way  merge). 
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Fig.  85.  A somewhat  misleading  way  to  compare  merge  patterns. 


Estimating  the  running  time.  Let’s  see  now  how  to  figure  out  the  ap- 
proximate execution  time  of  a sorting  method  using  MIXT  tapes.  Could  we 
have  predicted  the  outcomes  shown  in  Chart  A without  carrying  out  a detailed 
simulation? 

One  way  that  has  traditionally  been  used  to  compare  different  merge  pat- 
terns is  to  superimpose  graphs  such  as  we  have  seen  in  Figs.  70,  74,  and  78. 
These  graphs  show  the  effective  number  of  passes  over  the  data,  as  a function  of 
the  number  of  initial  runs,  assuming  that  each  initial  run  has  approximately  the 
same  length.  (See  Fig.  85.)  But  this  is  not  a very  realistic  comparison,  because 
we  have  seen  that  different  methods  lead  to  different  numbers  of  initial  runs; 
furthermore  there  is  a different  overhead  time  caused  by  the  relative  frequency 
of  interblock  gaps,  and  the  rewind  time  also  has  significant  effects.  All  of  these 
machine-dependent  features  make  it  impossible  to  prepare  charts  that  provide 
a valid  machine-independent  comparison  of  the  methods.  On  the  other  hand. 
Fig.  85  does  show  us  that,  except  for  balanced  merge,  the  effective  number 
of  passes  can  be  reasonably  well  approximated  by  smooth  curves  of  the  form 
Of  In  S'  + /?.  Therefore  we  can  make  a fairly  good  comparison  of  the  methods 
in  any  particular  situation,  by  studying  formulas  that  approximate  the  running 
time.  Our  goal,  of  course,  is  to  find  formulas  that  are  simple  yet  sufficiently 
realistic. 

Let  us  now  attempt  to  develop  such  formulas,  in  terms  of  the  following 
parameters: 

N = number  of  records  to  be  sorted, 

C = number  of  characters  per  record, 

M — number  of  character  positions  available  in  the  internal  memory  (assumed  to 
be  a multiple  of  C), 


PRACTICAL  CONSIDERATIONS  FOR  TAPE  MERGING  331 


5.4.6 


t = number  of  seconds  to  read  or  write  one  character, 
pr  = number  of  seconds  to  rewind  over  one  character, 
err  = number  of  seconds  for  stop/start  time  delay, 

7 = number  of  characters  per  interblock  gap, 

6 = number  of  seconds  for  operator  to  dismount  and  replace  input  tape, 

Bt  = number  of  characters  per  block  in  the  unsorted  input, 

Ba  = number  of  characters  per  block  in  the  sorted  output. 

For  MIXT  we  have  r = 1/60000,  p = 2/5,  a — 300,  7 = 480.  The  example 
application  treated  above  has  N = 100000,  C = 100,  M — 100000,  S = 30,  J3;  = 
Ba  — 5000.  These  parameters  are  usually  the  machine  and  data  characteristics 
that  affect  sorting  time  most  critically  (although  rewind  time  is  often  given  by  a 
more  complicated  expression  than  a simple  ratio  p).  Given  the  parameters  above 
and  a merge  pattern,  we  shall  compute  further  quantities  such  as 


P = maximum  order  of  merge  in  the  pattern, 

P'  = number  of  records  in  replacement  selection  tree, 

S = number  of  initial  runs, 

7T  — a In  S + /?  = approximate  average  number  of  times  each  character  is  read 
and  written,  not  counting  the  initial  distribution  or  the  final 
merge, 

7r'  = a'  In  S + (}'  = approximate  average  number  of  times  rewinding  over  each 
character  during  intermediate  merge  phases, 

B = number  of  characters  per  block  in  the  intermediate  merge 
phases, 


Wi,w,w0  = “overhead  ratio,”  the  effective  time  required  to  read  or  write 
a character  (due  to  gaps  and  stop/start)  divided  by  the  hard- 
ware time  r. 


The  examples  of  Chart  A have  chosen  block  and  buffer  sizes  according  to 
the  formula 

D M 

[c(2P  + 2)\  ’ ^ 

so  that  the  blocks  can  be  as  large  as  possible  consistent  with  the  buffering  scheme 
of  Algorithm  F.  (In  order  to  avoid  trouble  during  the  final  pass,  P should  be 
small  enough  that  (1)  makes  B > Ba.)  The  size  of  the  tree  during  replacement 
selection  is  then 

P'  — (M  - 2Bi  - 2B)/C.  (2) 

For  random  data  the  number  of  initial  runs  S can  be  estimated  as 


S « 


- at  7- 
2P7  + 6 


(3) 
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using  the  results  of  Section  5.4.1.  Assuming  that  Bt  < B and  that  the  input 
tape  can  be  run  at  full  speed  during  the  distribution  (see  below),  it  takes  about 
NCcuiT  seconds  to  distribute  the  initial  runs,  where 

u>i  = ( Bl  + 7 )/Bi.  (4) 

While  merging,  the  buffering  scheme  allows  simultaneous  reading,  writing,  and 
computing,  but  the  frequent  switching  between  input  tapes  means  that  we  must 
add  the  stop/start  time  penalty;  therefore  we  set 

w=(B  + 'y  + a)/B,  (5) 

and  the  merge  time  is  approximately 

(n  + pn')NCuT.  (6) 

This  formula  penalizes  rewind  slightly,  since  w includes  stop/start  time,  but 
other  considerations,  such  as  rewind  interlock  and  the  penalty  for  reading  from 
load  point,  usually  compensate  for  this.  The  final  merge  pass,  assuming  that 
B0  < B,  is  constrained  by  the  overhead  ratio 

wo  = ( Ba  + 7)/ B0.  (7) 

We  may  estimate  the  running  time  of  the  final  merge  and  rewind  as 

NC(l  + p)uj0t- 

in  practice  it  might  take  somewhat  longer  due  to  the  presence  of  unequal  block 
lengths  (input  and  output  are  not  synchronized  as  in  Algorithm  F),  but  the 
running  time  will  be  pretty  much  the  same  for  all  merge  patterns. 

Before  going  into  more  specific  formulas  for  individual  patterns,  let  us  try 
to  justify  two  of  the  assumptions  made  above. 

a)  Can  replacement  selection  keep  up  with  the  input  tape?  In  the  examples 
of  Chart  A it  probably  can,  since  it  takes  about  ten  iterations  of  the  inner 
loop  of  Algorithm  5.4. 1R  to  select  the  next  record,  and  we  have  Cw.r  > 1667 
microseconds  in  which  to  do  this.  With  careful  programming  of  the  replacement 
selection  loop,  this  can  be  done  on  most  machines  (even  in  the  1970s).  Notice 
that  the  situation  is  somewhat  less  critical  while  merging:  The  computation  time 
per  record  is  almost  always  less  than  the  tape  time  per  record  during  a P- way 
merge,  since  P isn’t  very  large. 

b)  Should  we  really  choose  B to  he  the  maximum  possible  buffer  size , as 
in  (1)?  A large  buffer  size  cuts  down  the  overhead  ratio  w in  (5);  but  it  also 
increases  the  number  of  initial  runs  S,  since  P'  is  decreased.  It  is  not  immediately 
clear  which  factor  is  more  important.  Considering  the  merging  time  as  a function 
of  x = CP' . we  can  express  it  in  the  approximate  form 


for  some  appropriate  constants  0U  02,  63,  04,  with  63  > 04 . Differentiating  with 
respect  to  x shows  that  there  is  some  N0  such  that  for  all  N > N0  it  does  not  pay 
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to  increase  x at  the  expense  of  buffer  size.  In  the  sorting  application  of  Chart  A, 
for  example,  No  turns  out  to  be  roughly  10000;  when  sorting  more  than  10000 
records  the  large  buffer  size  is  superior. 

Note,  however,  that  with  balanced  merge  the  number  of  passes  jumps  sharply 
when  5 passes  a power  of  P.  If  an  approximation  to  N is  known  in  advance, 
the  buffer  size  should  be  chosen  so  that  S will  most  likely  be  slightly  less  than  a 
power  of  P.  For  example,  the  buffer  size  for  the  first  line  of  Chart  A was  12500; 
since  S = 78,  this  was  very  satisfactory,  but  if  S had  turned  out  to  be  82  it 
would  have  been  much  better  to  decrease  the  buffer  size  a little. 

Formulas  for  the  ten  examples.  Returning  to  Chart  A,  let  us  try  to  give 
formulas  that  approximate  the  running  time  in  each  of  the  ten  methods.  In  most 
cases  the  basic  formula 

NCujtr  + (it  + ptt')NCujt  + (1  + p)NCuj0t  (9) 

will  be  a sufficiently  good  approximation  to  the  overall  sorting  time,  once  we 
have  specified  the  number  of  intermediate  merge  passes  it  = a In  S + 0 and  the 
number  of  intermediate  rewind  passes  7 r'  = a'  In  S + /3'.  Sometimes  it  is  necessary 
to  add  a further  correction  to  (9);  details  for  each  method  can  be  worked  out  as 
follows: 

Example  1.  Read-forward  balanced  merge.  The  formulas 
7T  = rinS/lnP]  - 1,  tt'  = \\nS/\nP]/P 
may  be  used  for  P- way  merging  on  2 P tapes. 

Example  2.  Read-forward  polyphase  merge.  We  may  take  7 r'  « 7 r,  since 
every  phase  is  usually  followed  by  a rewind  of  about  the  same  length  as  the 
previous  merge.  From  Table  5.4. 2-1  we  get  the  values  a « 0.795,  0 « 0.864  - 2, 
in  the  case  of  six  tapes.  (We  subtract  2 because  the  table  entry  includes  the 
initial  and  final  passes  as  well  as  the  intermediate  ones.)  The  time  for  rewinding 
the  input  tape  after  the  initial  distribution,  namely  pNCuiiT+S , should  be  added 
to  (9). 

Example  3.  Read-forward  cascade  merge.  Table  5. 4. 3-1  gives  the  values 
a ss  0.773,  0 k,  0.808  - 2.  Rewind  time  is  comparatively  difficult  to  estimate; 
perhaps  setting  7r'  « 7r  is  accurate  enough.  As  in  example  2,  we  need  to  add  the 
initial  rewind  time  to  (9). 

Example  4.  Tape-splitting  polyphase  merge.  Table  5. 4. 2-6  tells  us  that 
a « 0.752,  0 « 1.024  - 2.  The  rewind  time  is  almost  overlapped  except  after 
the  initialization  ( pNCtOiT  + S ) and  two  phases  near  the  end  (2 pNCcor  times 
36  percent).  We  may  also  subtract  0.18  from  0 since  the  first  half  phase  is 
overlapped  by  the  initial  rewind. 

Example  5.  Cascade  merge  with  rewind  overlap.  In  this  case  we  use 
Table  5.4.3-1  for  T = 5,  to  get  a « 0.897,  0 « 0.800  - 2.  Nearly  all  of  the 
unoverlapped  rewind  occurs  just  after  the  initial  distribution  and  just  after  each 
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two-way  merge.  After  a perfect  initial  distribution,  the  longest  tape  contains 
about  1/ g of  the  data,  where  g is  the  “growth  ratio.”  After  each  two-way  merge 
the  amount  of  rewind  in  the  six-tape  case  is  (see  exercise  5. 4. 3-5),  hence 

the  amount  of  rewind  after  two-way  merges  in  the  7^- tape  case  can  be  shown  to 
be  approximately 

(2/(2 T - 1))  (1  - cos(47r/(2T  - 1))) 

of  the  hie.  In  our  case,  T = 5,  this  is  §(1  - cos 80°)  « 0.184  of  the  hie,  and  the 
number  of  times  it  occurs  is  0.946  In  S + 0.796  - 2. 

Example  6.  Read-backward  balanced  merge.  This  is  like  example  1,  ex- 
cept that  most  of  the  rewinding  is  eliminated.  The  change  in  direction  from 
forward  to  backward  causes  some  delays,  but  they  are  not  significant.  There  is 
a 50-50  chance  that  rewinding  will  be  necessary  before  the  final  pass,  so  we  mav 
take  7 r'  = 1/(2 P). 

Example  7.  Read-backward  polyphase  merge.  Since  replacement  selec- 
tion in  this  case  produces  runs  that  change  direction  about  every  P times,  we 
must  replace  (3)  by  another  formula  for  S.  A reasonably  good  approximation, 
suggested  by  exercise  5.4. T 24,  is  S = [AT (3  + 1/ P)/(6P')~\  -f- 1.  All  rewind  time 
is  eliminated,  and  Table  5.4.2-1  gives  a « 0.863,  (3  « 0.921  - 2. 

Example  8.  Read-backward  cascade  merge.  From  Table  5.4.3  1 we  have 
a 0.897,  ft  ps  0.800  - 2.  The  rewind  time  can  be  estimated  as  twice  the 
difference  between  “passes  with  copying”  minus  “passes  without  copying”  in 
that  table,  plus  1/(2P)  in  case  the  final  merge  must  be  preceded  by  rewinding 
to  get  ascending  order. 

Example  9.  Read-backward  oscillating  sort.  In  this  case  replacement  se- 
lection has  to  be  started  and  stopped  many  times;  bursts  of  P - 1 to  2P  — 1 
runs  are  distributed  at  a time,  averaging  P in  length;  the  average  length  of  runs 
therefore  turns  out  to  be  approximately  P'{2P  - 4/3 )/P,  and  we  may  estimate 
S — I -/V/((2  — 4/ (3P))P')]  + 1.  A little  time  is  used  to  switch  from  merging  to 
distribution  and  vice-versa;  this  is  approximately  the  time  to  read  in  P'  records 
from  the  input  tape,  namely  P'Cuj.t,  and  it  occurs  about  S/P  times.  Rewind 
time  and  merging  time  may  be  estimated  as  in  example  6. 

Example  10.  Read-forward  oscillating  sort.  This  method  is  not  easy  to 

analyze,  because  the  final  “cleanup”  phases  performed  after  the  input  is  ex- 
hausted are  not  as  efficient  as  the  earlier  phases.  Ignoring  this  troublesome 
aspect,  and  simply  calling  it  one  extra  pass,  we  can  estimate  the  merging  time  by 
setting  a = 1/lii.P,  [3  = 0,  and  7 r'  = n/P.  The  distribution  of  runs  is  somewhat 
different  in  this  case,  since  replacement  selection  is  not  used;  we  set  P'  = M/C 
and  S = \N / P ] . With  care  we  will  be  able  to  overlap  computing,  reading,  and 
writing  during  the  distribution,  with  an  additional  factor  of  about  ( M+2B)  /M  in 
the  overhead.  The  “mode-switching”  time  mentioned  in  example  9 is  not  needed 
in  the  present  case  because  it  is  overlapped  by  rewinding.  So  the  estimated 
sorting  time  in  this  case  is  (9)  plus  2BNClo1t/M. 
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Table  1 

SUMMARY  OF  SORTING  TIME  ESTIMATES 


Additions  Est.  Actual 


Ex. 

P 

B 

P' 

S 

OJ 

a 

0 

a' 

0' 

(9) 

to  (9) 

total 

total 

1 

3 

12500 

650 

79 

1.062 

0.910 

-1.000 

0.303 

0.000 

1064 

1064 

1076 

2 

5 

8300 

734 

70 

1.094 

0.795 

-1.136 

0.795 

-1.136 

1010 

pNCuiiT  + 5 

1113 

1103 

3 

5 

8300 

734 

70 

1.094 

0.773 

-1.192 

0.773 

-1.192 

972 

pNCcoiT  + S 

1075 

1127 

4 

4 

10000 

700 

73 

1.078 

0.752 

-0.994 

0.000 

0.720 

844 

pNCujir  + 5 

947 

966 

5 

4 

10000 

700 

73 

1.078 

0.897 

-1.200 

0.173 

0.129 

972 

972 

992 

6 

3 

12500 

650 

79 

1.062 

0.910 

-1.000 

0.000 

0.167 

981 

981 

980 

7 

4 

10000 

700 

79 

1.078 

0.863 

-1.079 

0.000 

0.000 

922 

922 

907 

8 

4 

10000 

700 

73 

1.078 

0.897 

-1.200 

0.098 

0.117 

952 

952 

949 

9 

4 

10000 

700 

87 

1.078 

0.721 

-1.000 

0.000 

0.125 

846 

P'SCuht/P 

874 

928 

10 

4 

10000 

— 

100 

1.078 

0.721 

0.000 

0.180 

0.000 

1095 

2BNCuiir/M 

1131 

1158 

Table  1 shows  that  the  estimates  are  not  too  bad  in  these  examples,  although 
in  a few  cases  there  is  a discrepancy  of  50  seconds  or  so.  The  formulas  in 
examples  2 and  3 indicate  that  cascade  merge  should  be  preferable  to  polyphase 
on  six  tapes,  yet  in  practice  polyphase  was  better.  The  reason  is  that  graphs 
like  Fig.  85  (which  shows  the  five-tape  case)  are  more  nearly  straight  lines  for 
the  polyphase  algorithm;  cascade  is  superior  to  polyphase  on  six  tapes  for  14  < 
S < 15  and  43  < S < 55,  near  the  “perfect”  cascade  numbers  15  and  55,  but 
the  polyphase  distribution  of  Algorithm  5.4. 2D  is  equal  or  better  for  all  other 
S < 100.  Cascade  will  win  over  polyphase  as  S -»  oo,  but  S doesn’t  actually 
approach  oo.  The  underestimate  in  example  9 is  due  to  similar  circumstances; 
polyphase  was  superior  to  oscillating  even  though  the  asymptotic  theory  tells  us 
that  oscillating  will  be  better  for  large  5. 

Some  miscellaneous  remarks.  It  is  now  appropriate  to  make  a few  more  or 
less  random  observations  about  tape  merging. 

• The  formulas  above  show  that  the  cost  of  tape  sorting  is  essentially  a 
function  of  N times  C , not  of  N and  C independently.  Except  for  a few  relatively 
minor  considerations  (such  as  the  fact  that  B was  taken  to  be  a multiple  of  C), 
our  formulas  say  that  it  takes  about  as  long  to  sort  one  million  records  of  10 
characters  each  as  to  sort  100,000  records  of  100  characters  each.  Actually  there 
may  be  a difference,  not  revealed  in  our  formulas,  because  of  the  space  used  by 
link  fields  during  replacement  selection.  In  any  event  the  size  of  the  key  makes 
hardly  any  difference,  unless  keys  get  so  long  and  complicated  that  internal 
computation  cannot  keep  up  with  the  tapes. 

With  long  records  and  short  keys  it  is  tempting  to  “detach”  the  keys,  sort 
them  first,  and  then  somehow  rearrange  the  records  as  a whole.  But  this  idea 
doesn’t  really  work;  it  merely  postpones  the  agony,  because  the  final  rearrange- 
ment procedure  takes  about  as  long  as  a conventional  merge  sort  would  take. 

• When  writing  a sort  routine  that  is  to  be  used  repeatedly,  it  is  wise  to 
estimate  the  running  time  very  carefully  and  to  compare  the  theory  with  actual 
observed  performance.  Since  the  theory  of  sorting  has  been  fairly  well  developed, 
this  procedure  has  been  known  to  turn  up  bugs  in  the  input /output,  hardware  or 
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software  on  existing  systems;  the  service  was  substantially  slower  than  it  should 
have  been,  yet  nobody  had  noticed  it  until  the  sorting  routine  ran  too  slowly! 

• Our  analysis  of  replacement  selection  has  been  carried  out  for  “random” 
files,  but  the  files  that  actually  arise  in  practice  very  often  have  a good  deal  of 
existing  order.  (In  fact,  sometimes  people  will  sort  a file  that  is  already  in  order, 
just  to  be  sure.)  Therefore  experience  has  shown  that  replacement  selection 
is  preferable  to  other  kinds  of  internal  sort,  even  more  so  than  our  formulas 
indicate.  This  advantage  is  slightly  mitigated  in  the  case  of  read-backward 
polyphase  sorting,  since  a number  of  descending  runs  must  be  produced;  indeed, 
R.  L.  Gilstad  (who  first  published  the  polyphase  merge)  originally  rejected  the 
lead-backward  technique  for  that  reason.  But  he  noticed  later  that  alternating 
directions  will  still  pick  up  long  ascending  runs.  Furthermore,  read-backward 
polyphase  is  the  only  standard  technique  that  likes  descending  input  files  as  well 
as  ascending  ones. 

• Another  advantage  of  replacement  selection  is  that  it  allows  simultaneous 
reading,  writing,  and  computing.  If  we  merely  did  the  internal  sort  in  an  obvious 
way  — filling  the  memory,  sorting  it,  then  writing  it  out  as  it  becomes  filled  with 
the  next  load  — the  distribution  pass  would  take  about  twice  as  long. 

The  only  other  internal  sort  we  have  discussed  that  appears  to  be  amenable 
to  simultaneous  reading,  writing,  and  computing  is  heapsort.  Suppose  for  con- 
venience that  the  internal  memory  holds  1000  records,  and  that  each  block  on 
tape  holds  100.  Example  10  of  Chart  A was  prepared  with  the  following  strategy, 
letting  B\  B2  . . . Bw  stand  for  the  contents  of  memory  divided  into  ten  100-record 
blocks: 

Step  0.  Fill  memory,  and  make  the  elements  of  B2  . . . B10  satisfy  the  inequalities 
for  a heap  (with  smallest  element  at  the  root). 

Step  1.  Make  B\ . . . Bio  into  a heap,  then  select  out  the  least  100  records  and 
move  them  to  Bw. 

Step  2.  Write  out  B10,  while  selecting  the  smallest  100  records  of  Bx . . . B9  and 
moving  them  to  Bq. 

Step  3.  Read  into  B10,  and  write  out  B9 , while  selecting  the  smallest  100  records 
of  Bj  . . . Bh  and  moving  them  to  B$. 

Step  9.  Read  into  B4 , and  write  out  B3,  while  selecting  the  smallest  100  records 
of  Bi  B2  and  moving  them  to  B2  and  while  making  the  heap  inequalities  valid 
in  B5  . . . Bw. 

Step  10.  Read  into  B3,  and  write  out  B2,  while  sorting  B\  and  while  making 
the  heap  inequalities  valid  in  B4. . . Bw. 

Step  11.  Read  into  B2,  and  write  out  B4,  while  making  the  heap  inequalities 
valid  in  B3  . . . B4  0. 

Step  12.  Read  into  B±,  while  making  the  heap  inequalities  valid  in  B2  . . . /jln. 
Return  to  step  1.  | 
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• We  have  been  assuming  that  the  number  N of  records  to  be  sorted  is  not 
known  in  advance.  Actually  in  most  computer  applications  it  would  be  possible 
to  keep  track  of  the  number  of  records  in  all  files  at  all  times,  and  we  could  assume 
that  our  computer  system  is  capable  of  telling  us  the  value  of  N.  How  much  help 
would  this  be?  Unfortunately,  not  very  much!  We  have  seen  that  replacement 
selection  is  very  advantageous,  but  it  leads  to  an  unpredictable  number  of  initial 
runs.  In  a balanced  merge  we  could  use  information  about  N to  set  the  buffer 
size  B in  such  a way  that  S will  probably  be  just  less  than  a power  of  P;  and  in 
a polyphase  distribution  with  optimum  placement  of  dummy  runs  we  could  use 
information  about  N to  decide  what  level  to  shoot  for  (see  Table  5. 4. 2-2). 

• Tape  drives  tend  to  be  the  least  reliable  part  of  a computer.  Therefore  the 
original  input  tape  should  never  be  destroyed  until  it  is  known  that  the  entire  sort 
has  been  satisfactorily  completed.  The  “operator  dismount  time”  is  annoying  in 
some  of  the  examples  of  Chart  A,  but  it  would  be  too  risky  to  overwrite  the  input 
in  view  of  the  probability  that  something  might  go  wrong  during  a long  sort. 

• When  changing  from  forward  write  to  backward  read,  we  could  save  some 
time  by  never  writing  the  last  bufferload  onto  tape;  it  will  just  be  read  back  in 
again  anyway.  But  Chart  A shows  that  this  trick  actually  saves  comparatively 
little  time,  except  in  the  oscillating  sort  where  directions  are  reversed  frequently. 

• Although  a large  computer  system  might  have  lots  of  tape  units,  we  might 
be  better  off  not  using  them  all.  The  percentage  difference  between  logP  S and 
logp+i  S is  not  very  great  when  P is  large,  and  a higher  order  of  merge  usually 
implies  a smaller  block  size.  (Consider  also  the  poor  computer  operator  who 
has  to  mount  all  those  scratch  tapes.)  On  the  other  hand,  exercise  12  describes 
an  interesting  way  to  make  use  of  additional  tape  units,  grouping  them  so  as  to 
overlap  input/output  time  without  increasing  the  order  of  merge. 

• On  machines  like  MIX,  which  have  fixed  rather  small  block  sizes,  hardly  any 
internal  memory  is  needed  while  merging.  Oscillating  sort  then  becomes  more 
attractive,  because  it  becomes  possible  to  maintain  the  replacement  selection 
tree  in  memory  while  merging.  In  fact  we  can  improve  on  oscillating  sort  in  this 
case  (as  suggested  by  Colin  J.  Bell  in  1962),  merging  a new  initial  run  into  the 
output  every  time  we  merge  from  the  working  tapes. 

• We  have  observed  that  multireel  files  should  be  sorted  one  reel  at  a time, 
in  order  to  avoid  excessive  tape  handling.  This  is  sometimes  called  a “reel  time” 
application.  Actually  a balanced  merge  on  six  tapes  can  sort  three  reelfuls,  up 
until  the  time  of  the  final  merge,  if  it  has  been  programmed  carefully. 

To  merge  a fairly  large  number  of  individually  sorted  reels,  a minimum- 
path-length  merging  tree  will  be  fastest  (see  Section  5.4.4).  This  construction 
was  first  made  by  E.  H.  Friend  [JACM  3 (1956),  166-167];  then  W.  H.  Burge 
[Information  and  Control  1 (1958),  181  197]  pointed  out  that  an  optimum  way 
to  merge  runs  of  given  (possibly  unequal)  lengths  is  obtained  by  constructing  a 
tree  with  minimum  weighted  path  length,  using  the  run  lengths  as  weights  (see 
Sections  2. 3. 4. 5 and  5.4.9),  if  we  ignore  tape  handling  time. 


338  SORTING 


5.4.6 


• Our  discussions  have  blithely  assumed  that  we  have  direct  control  over 
the  input/output  instructions  for  tape  units,  and  that  no  complicated  operating 
system  keeps  us  from  using  tape  as  efficiently  as  the  tape  designers  intended. 
These  idealistic  assumptions  give  us  insights  into  the  tape  merging  problem,  and 
may  give  some  insights  into  the  proper  design  of  operating  system  interfaces, 
but  we  should  realize  that  multiprogramming  and  multiprocessing  can  make  the 
situation  considerably  more  complicated. 

• The  issues  we  have  studied  in  this  section  were  first  discussed  in  print 
by  E.  H.  Friend  [JACM  3 (1956),  134  168],  W.  Zoberbier  [Elektronische  Daten- 
verarbeitung  5 (1960),  28-44],  and  M.  A.  Goetz  [ Digital  Computer  User's  Hand- 
book (New  York:  McGraw  Hill,  1967),  1.292-1.320]. 

Summary.  We  can  sum  up  what  we  have  learned  about  the  relative  efficiencies 
of  different  approaches  to  tape  sorting  in  the  following  way: 

Theorem  A.  It  is  difficult  to  decide  which  merge  pattern  is  best  in  a given 
situation.  | 

The  examples  we  have  seen  in  Chart  A show  how  100,000  randomly  ordered 
100-character  records  (or  1 million  10-character  records)  might  be  sorted  using 
six  tapes  under  realistic  assumptions.  This  much  data  fills  about  half  of  a tape, 
and  it  can  be  sorted  in  about  15  to  19  minutes  on  the  MIXT  tapes.  However,  there 
is  considerable  variation  in  available  tape  equipment,  and  running  times  for  such 
a job  could  vary  between  about  four  minutes  and  about  two  hours  on  different 
machines  of  the  1970s.  In  our  examples,  about  3 minutes  of  the  total  time  were 
used  for  initial  distribution  of  runs  and  internal  sorting;  about  4^  minutes  were 
used  for  the  final  merge  and  rewinding  the  output  tape;  and  about  7^  to  11 1 
minutes  were  spent  in  intermediate  stages  of  merging. 

Given  six  tapes  that  cannot  read  backwards,  the  best  sorting  method  under 
our  assumptions  was  the  “tape-splitting  polyphase  merge”  (example  4);  and  for 
tapes  that  do  allow  backward  reading,  the  best  method  turned  out  to  be  read- 
backward  polyphase  with  a complicated  placement  of  dummy  runs  (example  7). 
Oscillating  sort  (example  9)  was  a close  second.  In  both  cases  the  cascade  merge 
provided  a simpler  alternative  that  was  only  slightly  slower  (examples  5 and  8). 
In  the  read-forward  case,  a straightforward  balanced  merge  (example  1)  was 
surprisingly  effective,  partly  by  luck  in  this  particular  example  but  partly  also 
because  it  spends  comparatively  little  time  rewinding. 

The  situation  would  change  somewhat  if  we  had  a different  number  of 
available  tapes. 

Sort  generators.  Given  the  wide  variability  of  data  and  equipment  charac- 
teristics, it  is  almost  impossible  to  write  a single  external  sorting  program  that  is 
satisfactory  in  a variety  of  different  applications.  And  it  is  also  rather  difficult  to 
prepare  a program  that  really  handles  tapes  efficiently.  Therefore  the  preparation 
of  sorting  software  is  a particularly  challenging  job.  A sort  generator  is  a program 
that  produces  machine  code  specially  tailored  to  particular  sorting  applications, 
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Chart  A.  Tape  merging. 
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based  on  parameters  that  describe  the  data  format  and  the  hardware  configura- 
tion. Such  a program  is  often  tied  to  high-level  languages  such  as  COBOL  or  PL/I. 

One  of  the  features  normally  provided  by  a sort  generator  is  the  ability 
to  insert  the  user’s  “own  coding,”  a sequence  of  special  instructions  to  be  in- 
corporated into  the  first  and  last  passes  of  the  sorting  routine.  First-pass  own 
coding  is  usually  used  to  edit  the  input  records,  often  shrinking  them  or  slightly 
expanding  them  into  a form  that  is  easier  to  sort.  For  example,  suppose  that 
the  input  records  are  to  be  sorted  on  a nine-character  key  that  represents  a date 
in  month-day- year  format: 

JUL041T76  0CT311517  N0V051605  JUL141789  N0V071917 

On  the  first  pass  the  three-letter  month  code  can  be  looked  up  in  a table,  and 
the  month  codes  can  be  replaced  by  numbers  with  the  most  significant  fields  at 
the  left: 


17760704  15171031  16051105  17890714  19171107 

This  decreases  the  record  length  and  makes  subsequent  comparisons  much  sim- 
pler. (An  even  more  compact  code  could  also  be  substituted.)  Last-pass  own 
coding  can  be  used  to  restore  the  original  format,  and/or  to  make  other  desired 
changes  to  the  file,  and/or  to  compute  some  function  of  the  output  records.  The 
merging  algorithms  we  have  studied  are  organized  in  such  a way  that  it  is  easy 
to  distinguish  the  last  pass  from  other  merges.  Notice  that  when  own  coding 
is  present  there  must  be  at  least  two  passes  over  the  file  even  if  it  is  initially 
in  order.  Own  coding  that  changes  the  record  size  can  make  it  difficult  for  the 
oscillating  sort  to  overlap  some  of  its  input/output  operations. 

Sort  generators  also  take  care  of  system  details  like  tape  label  conventions, 
and  they  often  provide  for  “hash  totals”  or  other  checks  to  make  sure  that  none 
of  the  data  has  been  lost  or  altered.  Sometimes  there  are  provisions  for  stopping 
the  sort  at  convenient  places  and  resuming  later.  The  fanciest  generators  allow 
records  to  have  dynamically  varying  lengths  [see  D.  J.  Waks,  CACM  6 (1963), 
267-272]. 

*Merging  with  fewer  buffers.  We  have  seen  that  2P  + 2 buffers  are  sufficient 
to  keep  tapes  moving  rapidly  during  a P-way  merge.  Let  us  conclude  this  section 
by  making  a mathematical  analysis  of  the  merging  time  when  fewer  than  2P  + 2 
buffers  are  present. 

Two  output  buffers  are  clearly  desirable,  since  we  can  be  writing  from  one 
while  forming  the  next  block  of  output  in  the  other.  Therefore  we  may  ignore 
the  output  question  entirely,  and  concentrate  only  on  the  input. 

Suppose  there  are  P + Q input  buffers,  where  1 < Q < P.  We  shall  use  the 
following  approximate  model  of  the  situation,  as  suggested  by  L.  J.  Woodrum 
[IBM  Systems  J.  9 (1970),  118-144]:  It  takes  one  unit  of  time  to  read  a block  of 
tape.  During  this  time  there  is  a probability  p0  that  no  input  buffers  have  been 
emptied,  p\  that  one  has  been  emptied,  p> 2 that  two  or  more  have  been,  etc. 
When  completing  a tape  read  we  are  in  one  of  Q + 1 states: 
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State  0.  Q buffers  are  empty;  we  begin  to  read  a block  into  one  of  them  from  the 
appropriate  file,  using  the  forecasting  technique  explained  earlier  in  this  section. 
After  one  unit  of  time  we  go  to  state  1 with  probability  po,  otherwise  we  remain 
in  state  0. 

State  1.  Q-  1 buffers  are  empty;  we  begin  to  read  into  one  of  them,  forecasting 
the  appropriate  file.  After  one  unit  of  time  we  go  to  state  2 with  probability  p0, 
to  state  1 with  probability  p i,  and  to  state  0 with  probability  p>2. 


State  Q — 1.  One  buffer  is  empty;  we  begin  to  read  into  it,  forecasting  the 
appropriate  file.  After  one  unit  of  time  we  go  to  state  Q with  probability  p0,  to 
state  Q — 1 with  probability  pi,  . . . , to  state  1 with  probability  pq~i,  and  to 
state  0 with  probability  p>Q. 

State  Q.  All  buffers  are  filled.  Tape  reading  stops  for  an  average  of  p units  of 
time  and  then  we  go  to  state  Q — 1. 

We  start  in  state  0.  This  model  of  the  situation  corresponds  to  a Markov 
process  (see  exercise  2.3.4.2-26),  which  can  be  analyzed  via  generating  functions 
in  the  following  interesting  way:  Let  z be  an  arbitrary  parameter,  and  assume 
that  each  time  we  have  a chance  to  read  from  tape  we  make  a decision  to  do  so 
with  probability  z,  but  we  decide  to  terminate  the  algorithm  with  probability 
l-z.  Now  let  gqiz)  — J2n>o  an*^zn(l  - z)  be  the  average  number  of  times  that 
state  Q occurs  in  such  a process;  it  follows  that  a^'1  is  the  average  number  of 
times  state  Q occurs  when  exactly  n blocks  have  been  read.  Then  n + a(Q>p  is 
the  average  total  time  for  input  plus  computation.  If  we  had  perfect  overlap,  as 
in  the  (2 P + 2)-buffer  algorithm,  the  total  time  would  be  only  n units,  so  a^p 
represents  the  “reading  hangup”  time. 

Let  Atj  be  the  probability  that  we  go  from  state  i to  state  j in  this  process, 
for  0 < i,j  < Q + 1,  where  Q + 1 is  a new  “stopped”  state.  For  example,  the 
A-matrix  takes  the  following  forms  for  small  Q: 
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Exercise  2.3.4.2-26(b)  tells  us  that  gqiz)  = cofactorQ0(/  - A)/det(J  — A).  Thus 
for  example  when  Q = 1 we  have 


(0  -p0z  z-l\  / /l  -p>iz  -p0z  z 1 \ 
10  0 I / det  I -1  1 0 j 

00  1 //  \ 0 0 1 / 


Pqz 

1 -p>iz-p0z 


P 0Z  V'  n/-,  \ 

yZZ  = npoz  (1  -z), 

n>  0 


so  o,n  ^ — npo . This  of  course  was  obvious  a priori,  since  the  problem  is  very 
simple  when  Q = 1.  A similar  calculation  when  Q — 2 (see  exercise  14)  gives 
the  less  obvious  formula 


,(2) 


pin 


Po(l~Pi) 

1-P!  (1  - Pi)2  ' 


(10) 


In  general  we  can  show  that  has  the  form  a(A)n  + 0(1)  as  n H>  oc,  where 
the  constant  is  not  terribly  difficult  to  calculate.  (See  exercise  15.)  It  turns 
out  that  =p3/((l  - px)2  -p0p2). 

The  nature  of  merging  makes  it  fairly  reasonable  to  assume  that  p = 1/P 
and  that  we  have  a binomial  distribution 


Pk 


(X)‘(^) 


P-k 


For  example,  when  P = 5 we  have  p0  — .32768,  pi  = .4096,  p2  = .2048, 
P3  = .0512,  p4  = .0064,  and  ps  = .00032;  hence  « 0.328,  a ^ & 0.182,  and 
a<3>  ~ 0.125.  In  other  words,  if  we  use  5 + 3 input  buffers  instead  of  5 + 5,  we 
can  expect  an  additional  “reading  hangup”  time  of  about  0.125/5  « 2.5  percent. 

Of  course  this  model  is  only  a very  rough  approximation;  we  know  that  when 
Q = P there  is  no  hangup  time  at  all,  but  the  model  says  that  there  is.  The 
extra  reading  hangup  time  for  smaller  Q just  about  counterbalances  the  savings 
in  overhead  gained  by  having  larger  blocks,  so  the  simple  scheme  with  Q = P 
seems  to  be  vindicated. 


EXERCISES 

1.  [13]  Give  a formula  for  the  exact  number  of  characters  per  tape,  when  every  block 
on  the  tape  contains  n characters.  Assume  that  the  tape  could  hold  exactly  23000000 
characters  if  there  were  no  interblock  gaps. 

2.  [15]  Explain  why  the  first  buffer  for  File  2,  in  line  6 of  Fig.  84,  is  completely 
blank. 

3.  [20]  Would  Algorithm  F work  properly  if  there  were  only  2 P — 1 input  buffers 
instead  of  2 PI  If  so,  prove  it;  if  not,  give  an  example  where  it  fails. 

4.  [20]  How  can  Algorithm  F be  changed  so  that  it  works  also  when  P = 1? 

► 5.  [21]  When  equal  keys  are  present  on  different  files,  it  is  necessary  to  be  very 
careful  in  the  forecasting  process.  Explain  why,  and  show  how  to  avoid  difficulty  by 
defining  the  merging  and  forecasting  operations  of  Algorithm  F more  precisely. 
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6.  [22]  What  changes  should  be  made  to  Algorithm  5.4.3C  in  order  to  convert  it 
into  an  algorithm  for  cascade  merge  with  rewind  overlap,  on  T + 1 tapes? 

► 7 . [26]  The  initial  distribution  in  example  7 of  Chart  A produces 

(A1D1)11  DX(A1D1)10  D1(A1D1)9  Di(AiDi)7 

on  tapes  1-4,  where  (AiDi)7  means  A1D1AlD1A1D1A1D1A1D1A1DxA1D1.  Show 
how  to  insert  additional  Ao’s  and  Do’s  in  a “best  possible”  way  (in  the  sense  that 
the  overall  number  of  initial  runs  processed  while  merging  is  minimized),  bringing  the 
distribution  up  to 

A(DA)14  (DA)28  (DA)26  (DA)22. 

Hint:  To  preserve  parity  it  is  necessary  to  insert  many  of  the  A0’s  and  D0’s  as  adjacent 
pairs.  The  merge  numbers  for  each  initial  run  may  be  computed  as  in  exercise  5. 4. 4-5; 
some  simplification  occurs  since  adjacent  runs  always  have  adjacent  merge  numbers. 

8.  [20]  Chart  A shows  that  most  of  the  schemes  for  initial  distribution  of  runs  (with 
the  exception  of  the  initial  distribution  for  the  cascade  merge)  tend  to  put  consecutive 
runs  onto  different  tapes.  If  consecutive  runs  went  onto  the  same  tape  we  could  save  the 
stop/ start  time;  would  it  therefore  be  a good  idea  to  modify  the  distribution  algorithms 
so  that  they  switch  tapes  less  often? 

► 9.  [22]  Estimate  how  long  the  read-backward  polyphase  algorithm  would  have  taken 
in  Chart  A,  if  we  had  used  all  T = 6 tapes  for  sorting,  instead  of  T = 5 as  in  example  7. 
Was  it  wise  to  avoid  using  the  input  tape? 

10.  [M23]  Use  the  analyses  in  Sections  5.4.2  and  5.4.3  to  show  that  the  length  of 
each  rewind  during  a standard  six-tape  polyphase  or  cascade  merge  is  rarely  more  than 
about  54  percent  of  the  file  (except  for  the  initial  and  final  rewinds,  which  cover  the 
entire  file). 

11.  [23]  By  modifying  the  appropriate  entries  in  Table  1,  estimate  how  long  the  first 
nine  examples  of  Chart  A would  have  taken  if  we  had  a combined  low  speed/high  speed 
rewind.  Assume  that  p = 1 when  the  tape  is  less  than  about  one-fourth  full,  and  that 
the  rewind  time  for  fuller  tapes  is  approximately  five  seconds  plus  the  time  that  would 
be  obtained  for  p = | . Change  example  8 so  that  it  uses  cascade  merge  with  copying, 
since  rewinding  and  reading  forward  is  slower  than  copying  in  this  case.  [Hint:  Use  the 
result  of  exercise  10.] 

12.  [40]  Consider  partitioning  six  tapes  into  three  pairs  of  tapes,  with  each  pair 
playing  the  role  of  a single  tape  in  a polyphase  merge  with  T = 3.  One  tape  of  each 
pair  will  contains  blocks  1, 3, 5, . . . and  the  other  tape  will  contain  blocks  2, 4, 6, ... ; in 
this  way  we  can  essentially  have  two  input  tapes  and  two  output  tapes  active  at  all 
times  while  merging,  effectively  doubling  the  merging  speed. 

a)  Find  an  appropriate  way  to  extend  Algorithm  F to  this  situation.  How  many 
buffers  should  there  be? 

b)  Estimate  the  total  running  time  that  would  be  obtained  if  this  method  were  used 
to  sort  100,000  100-character  records,  considering  both  the  read-forward  and  read- 
backward  cases. 

13.  [20]  Can  a five-tape  oscillating  sort,  as  defined  in  Algorithm  5.4.5B,  be  used  to 
sort  four  reelfuls  of  input  data,  up  until  the  time  of  the  final  merge? 

14.  [Ml 9]  Derive  (10). 
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15.  [HM29]  Prove  that  gQ(z)  = hq(z)/{  1 - z),  where  hQ(z)  is  a rational  function  of  z 
having  no  singularities  inside  the  unit  circle;  hence  a(„Q)  = hQ(l)n  + 0(1)  asnAoo. 
In  particular,  show  that 
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16.  [41]  Carry  out  detailed  studies  of  the  problem  of  sorting  100,000  100-character 
records,  drawing  charts  such  as  those  in  Chart  A,  assuming  that  3,  4,  or  5 tapes  are 
available. 

*5.4.7.  External  Radix  Sorting 

The  previous  sections  have  discussed  the  process  of  tape  sorting  by  merging; 
but  there  is  another  way  to  sort  with  tapes,  based  on  the  radix  sorting  principle 
that  was  once  used  in  mechanical  card  sorters  (see  Section  5.2.5).  This  method 
is  sometimes  called  distribution  sorting,  column  sorting,  pocket  sorting,  digital 
sorting,  separation  sorting,  etc.;  it  turns  out  to  be  essentially  the  opposite  of 
merging! 

Suppose,  for  example,  that  we  have  four  tapes  and  that  there  are  only  eight 
possible  keys:  0,  1,  2,  3,  4,  5,  6,  7.  If  the  input  data  is  on  tape  Tl,  we  can  begin 
by  transferring  all  even  keys  to  T3,  all  odd  keys  to  T4: 

T2  T3  T4 

{0,2, 4, 6}  {1,3, 5, 7} 

Now  we  rewind,  and  read  T3  and  then  T4,  putting  {0,  1,  4,  5}  on  Tl  and 
{2,  3,  6,  7}  on  T2: 

Pass  2 {0, 4}{1, 5}  {2,6}{3, 7}  — — 

(The  notation  “{0, 4}{1,5}”  stands  for  a file  that  contains  some  records  whose 
keys  are  all  0 or  4 followed  by  records  whose  keys  are  all  1 or  5.  Notice  that  Tl 
now  contains  those  keys  whose  middle  binary  digit  is  0.)  After  rewinding  again 
and  distributing  0,  1,  2,  3 to  T3  and  4,  5,  6,  7 to  T4,  we  have 

Pass  3 {0}{1}{2}{3}  {4}{5}{6}{7} 

Now  we  can  finish  up  by  copying  T4  to  the  end  of  T3.  In  general,  if  the  keys 
range  from  0 to  2k  — 1,  we  could  sort  the  file  in  an  analogous  way  using  k passes, 
followed  by  a final  collection  phase  that  copies  about  half  of  the  data  from  one 
tape  to  another.  With  six  tapes  we  could  use  radix  3 representations  in  a similar 
way,  to  sort  keys  from  0 to  3fc  - 1 in  k passes. 

Partial-pass  methods  can  also  be  used.  For  example,  suppose  that  there 
are  ten  possible  keys  {0, 1, ... , 9),  and  consider  the  following  procedure  due  to 


Tl 

Given  {0, 1,  2, 3, 4, 5, 6,  7} 
Pass  1 — 
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R.  L.  Ashenhurst  [Theory  of  Switching,  Progress  Report  BL-7  (Harvard  Univ, 
Comp.  Laboratory:  May  1954),  I.1-I.76]: 


Phase  Tl 

T2 

T3 

T4 

passes 

1 

{0,1,..., 9} 

{0,2, 4,  7} 

{1,5,6} 

{3,8,9} 

1.0 

2 

{0} 

— 

{1, 5, 6}{2, 7} 

{3, 8, 9}{4} 

0.4 

3 

{0} { 1 } {2} 

{6}{7} 

— 

{3, 8, 9}{4}{5} 

0.5 

4 

{0}{1}{2}{3} 

{6}{7}{8} 

{9} 

{4}{5} 

0.3 

C 

{0}{1}{2}{3}{4}.. 

.{9} 

0.6 

2.8 

Here  C represents  the  collection  phase.  If  each  key  value  occurs  about  one-tenth 
of  the  time,  the  procedure  above  takes  only  2.8  passes  to  sort  ten  keys,  while  the 
first  example  required  3.5  passes  to  sort  only  eight  keys.  Therefore  we  find  that 
a clever  distribution  pattern  can  make  a significant  difference,  for  radix  sorting 
as  well  as  for  merging. 

The  distribution  patterns  in  the  examples  above  can  conveniently  be  repre- 
sented as  tree  structures: 


The  circular  internal  nodes  of  these  trees  are  numbered  1,  2,  3,  ... , corresponding 
to  steps  1,  2,  3,  . . . of  the  process.  Tape  names  A,  B,  C,  D (instead  of  Tl,  T2, 
T3,  T4)  have  been  placed  next  to  the  lines  of  the  trees,  in  order  to  show  where 
the  records  go.  Square  external  nodes  represent  portions  of  a file  that  contain 
only  one  key,  and  that  key  is  shown  in  boldface  type  just  below  the  node.  The 
lines  just  above  square  nodes  all  carry  the  name  of  the  output  tape  ( C in  the 
first  example,  A in  the  second). 

Thus,  step  3 of  example  1 consists  of  reading  from  tape  D and  writing  Is 
and  5s  on  tape  A,  3s  and  7s  on  tape  B.  It  is  not  difficult  to  see  that  the  number 
of  passes  performed  is  equal  to  the  external  path  length  of  the  tree  divided  by 
the  number  of  external  nodes,  if  we  assume  that  each  key  occurs  equally  often. 
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Because  of  the  sequential  nature  of  tape,  and  the  first-in-first-out  discipline 
of  forwards  reading,  we  can’t  simply  use  any  labeled  tree  as  the  basis  of  a 
distribution  pattern.  In  the  tree  of  example  1,  data  gets  written  on  tape  A 
during  step  2 and  step  3;  it  is  necessary  to  use  the  data  written  during  step  2 
before  we  use  the  data  written  during  step  3.  In  general  if  we  write  onto  a tape 
during  steps  i and  j,  where  i < j,  we  must  use  the  data  written  during  step  i 
first;  when  the  tree  contains  two  branches  of  the  form 


i < j, 


we  must  have  k < l.  Furthermore  we  cannot  write  anything  onto  tape  A between 
steps  k and  Z,  because  we  must  rewind  between  reading  and  writing. 

The  reader  who  has  worked  the  exercises  of  Section  5.4.4  will  now  immedi- 
ately perceive  that  the  allowable  trees  for  read-forward  radix  sorting  on  T tapes 
are  precisely  the  strongly  T-fifo  trees , which  characterize  read-forward  merge 
sorting  on  T tapes!  (See  exercise  5.4.4-20.)  The  only  difference  is  that  all  of 
the  external  nodes  on  the  trees  we  are  considering  here  have  the  same  tape 
labels.  We  could  remove  this  restriction  by  assuming  a final  collection  phase 
that  transfers  all  records  to  an  output  tape,  or  we  could  add  that  restriction  to 
the  rules  for  T-fifo  trees  by  requiring  that  the  initial  distribution  pass  of  a merge 
sort  be  explicitly  represented  in  the  corresponding  merge  tree. 

In  other  words,  every  merge  pattern  corresponds  to  a distribution  pattern, 
and  every  distribution  pattern  corresponds  to  a merge  pattern.  A moment’s 
reflection  shows  why  this  is  so,  if  we  consider  the  actions  of  a merge  sort  and 
imagine  that  time  could  run  backwards:  The  final  output  is  “unmerged”  into 
subfiles,  which  are  unmerged  into  others,  etc.;  at  time  zero  the  output  has  been 
unmerged  into  S runs.  Such  a pattern  is  possible  with  tapes  if  and  only  if 
the  corresponding  radix  sort  distribution  pattern,  for  S keys,  is  possible.  This 
duality  between  merging  and  distribution  is  almost  perfect;  it  breaks  down  only 
in  one  respect,  namely  that  the  input  tape  must  be  saved  at  different  times. 

The  eight-key  example  treated  at  the  beginning  of  this  section  is  clearly 
dual  to  a balanced  merge  on  four  tapes.  The  ten-key  example  with  partial 
passes  corresponds  to  the  following  ten-run  merge  pattern  (if  we  suppress  the 
copy  phases,  steps  6-11  in  the  tree): 


T1 

T2 

T3 

T4 

Initial  distribution 

l4 

l3 

l1 

l2 

Tree  step  5 

l3 

l2 

— 

l23x 

Tree  step'  4 

l2 

l1 

21 

123x 

Tree  step  3 

l1 

— 

2X3X 

1131 

Tree  step  2 

— 

41 

31 

31 

Tree  step  1 

101 

— 

— 

— 

If  we  compare  this  to  the  radix  sort 

, we 

see  that  the  methods  h< 

the  same  structure  but  are  reversed  in  time,  with  the  tape  contents  also  reversed 
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from  back  to  front:  l23x  (two  runs  each  of  length  1 followed  by  one  of  length  3) 
corresponds  to  {3, 8, 9}{4}{5}  (two  subfiles  containing  one  key  each,  preceded 
by  one  subfile  containing  three). 

Going  the  other  way,  we  can  in  principle  construct  a radix  sort  dual  to 
polyphase  merge,  another  one  dual  to  cascade  merge,  etc.  For  example,  the 
21-run  polyphase  merge  on  three  tapes,  illustrated  at  the  beginning  of  Section 
5.4.2,  corresponds  to  the  following  interesting  radix  sort: 


Phase  T1 

{0,1, ...,20} 

1 — 

2 {0,5,10,13,18} 

, {0,5,10,13,18}{1,6,11,14,19} 

{2,7,12,15,20} 

4 — 

5 {8} {9} {10} {11} {12} 

6 {8}{9}{10}{11}{12}{13}...{20} 


T2 

{0,2,4,5,7,9,10,12,13,15,17,18,20} 

{3,8,16}{4,9,17} 

{3,8,16}{4,9, 17}{5,10,18} 
{6,11,19}{7,12,20} 

{0}{ 1} — {7} 


T3 

{1,3,6,8,11,14,16,19} 

{1,3,6,8,11,14,16,19} 

{2,4,7,9,12,15,17,20} 

{0,13} {1,14} {2, 15} 

{0,13}{1,14}{2,15} 
{3, 16}. ..{7, 20} 


The  distribution  rule  used  here  to  decide  which  keys  go  on  which  tapes  at 
each  step  appears  to  be  magic,  but  in  fact  it  has  a simple  connection  with  the 
Fibonacci  number  system.  (See  exercise  2.) 


Reading  backwards.  Duality  between  radix  sorting  and  merging  applies  also 
to  algorithms  that  read  tapes  backwards.  We  have  defined  “T-lifo  trees”  in 
Section  5.4.4,  and  it  is  easy  to  see  that  they  correspond  to  radix  sorts  as  well  as 
to  merge  sorts. 

A read-backward  radix  sort  was  actually  considered  by  John  Mauchly  al- 
ready in  1946,  in  one  of  the  first  papers  ever  to  be  published  about  sorting 
(see  Section  5.5);  Mauchly  essentially  gave  the  following  construction: 


Phase  T1 

T2 

{0,1,2,. ..,9} 

T3 

T4 

1 

{4,5} 

— 

{2, 3, 6,  7} 

{0,1, 8, 9} 

2 

{4, 5}{2, 7} 

{3,6} 

— 

{0,1, 8, 9} 

3 

{4, 5}{2, 7}{0, 9} 

{3, 6}{1, 8} 

— 

— 

4 

{4, 5}{2, 7} 

{3, 6}{1, 8} 

{9} 

{0} 

8 

— 

— 

{9}{8}{7}{6}{5} 

{0}{1}{2}{3}{4} 

C 

— 

— 

— 

{0}{1}{2}{3}{4}{5}...{9} 

His  scheme  is  not  the  most  efficient  one  possible,  but  it  is  interesting  because 
it  shows  that  partial  pass  methods  were  considered  for  radix  sorting  already  in 
1946,  although  they  did  not  appear  in  the  literature  for  merging  until  about  1960. 

An  efficient  construction  of  read-backward  distribution  patterns  has  been 
suggested  by  A.  Bayes  [CACM  11  (1968),  491-493]:  Given  P + 1 tapes  and 
5 keys,  divide  the  keys  into  P subfiles  each  containing  [S/PJ  or  \S/P]  keys, 


5.4.7 


EXTERNAL  RADIX  SORTING  347 


and  apply  this  procedure  recursively  to  each  subfile.  When  S < 2 P,  one  subfile 
should  consist  of  the  smallest  key  alone,  and  it  should  be  written  onto  the  output 
file.  (R.  M.  Karp’s  general  preorder  construction,  which  appears  at  the  end  of 
Section  5.4.4,  includes  this  method  as  a special  case.) 

Backward  reading  makes  merging  a little  more  complicated  because  it  re- 
verses the  order  of  runs.  There  is  a corresponding  effect  on  radix  sorting:  The 
outcome  is  stable  or  “anti-stable”  depending  on  what  level  is  reached  in  the  tree. 
After  a read-backward  radix  sort  in  which  some  of  the  external  nodes  are  at  odd 
levels  and  some  are  at  even  levels,  the  relative  order  of  different  records  with 
equal  keys  will  be  the  same  as  the  original  order  for  some  keys,  but  it  will  be 
the  opposite  of  the  original  order  for  the  other  keys.  (See  exercise  6.) 

Oscillating  merge  sorts  have  their  counterparts  too,  under  duality.  In  an 
oscillating  radix  sort  we  continue  to  separate  out  the  keys  until  reaching  subfiles 
that  have  only  one  key  or  are  small  enough  to  be  internally  sorted;  such  subfiles 
are  sorted  and  written  onto  the  output  tape,  then  the  separation  process  is 
resumed.  For  example,  if  we  have  three  work  tapes  and  one  output  tape,  and  if 
the  keys  are  binary  numbers,  we  may  start  by  putting  keys  of  the  form  Ox  on  tape 
Tl,  keys  lx  on  T2.  If  T1  receives  more  than  one  memory  load,  we  scan  it  again 
and  put  00a:  on  T2  and  Ola:  on  T3.  Now  if  the  00a:  subfile  is  short  enough  to  be 
internally  sorted,  we  do  so  and  output  the  result,  then  continue  by  processing 
the  Ola:  subfile.  Such  a method  was  called  a “cascading  pseudo-radix  sort”  by 
E.  H.  Friend  [JACM  3 (1956),  157-159];  it  was  developed  further  by  H.  Nagler 
[JACM  6 (1959),  459-468],  who  gave  it  the  colorful  name  “amphisbaenic  sort,” 
and  by  C.  H.  Gaudette  [IBM  Tech.  Disclosure  Bull.  12  (April  1970),  1849-1853]. 
Does  radix  sorting  beat  merging?  One  important  consequence  of  the  duality 
principle  is  that  radix  sorting  is  usually  inferior  to  merge  sorting.  This  happens 
because  the  technique  of  replacement  selection  gives  merge  sorting  a definite 
advantage;  there  is  no  apparent  way  to  arrange  radix  sorts  so  that  we  can  make 
use  of  internal  sorts  encompassing  more  than  one  memory  load  at  a time.  Indeed, 
the  oscillating  radix  sort  will  often  produce  subfiles  that  are  somewhat  smaller 
than  one  memory  load,  so  the  distribution  pattern  will  correspond  to  a tree  with 
many  more  external  nodes  than  would  be  present  if  merging  and  replacement 
selection  were  used.  Consequently  the  external  path  length  of  the  tree  — the 
sorting  time  — will  be  increased.  (See  exercise  5.3.1-33.) 

On  the  other  hand,  external  radix  sorting  does  have  its  uses.  Suppose, 
for  example,  that  we  have  a file  containing  the  names  of  all  employees  of  a 
large  corporation,  in  alphabetic  order;  the  corporation  has  10  divisions,  and 
it  is  desired  to  sort  the  file  by  division,  retaining  the  alphabetic  order  of  the 
employees  in  each  division.  This  is  a perfect  situation  in  which  to  apply  a stable 
radix  sort,  if  the  file  is  long,  since  the  number  of  records  that  belong  to  each 
of  the  10  divisions  is  likely  to  be  more  than  the  number  of  records  that  would 
be  obtained  in  initial  runs  produced  by  replacement  selection.  In  general,  if  the 
range  of  key  values  is  so  small  that  the  collection  of  records  having  a given  key 
is  expected  to  fill  the  internal  memory  more  than  twice,  it  is  wise  to  use  a radix 
sort  technique. 
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We  have  seen  in  Section  5.2.5  that  internal  radix  sorting  is  superior  to 
merging,  on  certain  high-speed  computers,  because  the  inner  loop  of  the  radix 
sort  algorithm  avoids  complicated  branching.  If  the  external  memory  is  especially 
fast,  it  may  be  impossible  for  such  machines  to  merge  data  rapidly  enough  to 
keep  up  with  the  input/output  equipment.  Radix  sorting  may  therefore  turn  out 
to  be  superior  to  merging  in  such  a situation,  especially  if  the  keys  are  known  to 
be  uniformly  distributed. 

EXERCISES 

1.  [20]  The  general  T-tape  balanced  merge  with  parameter  P,  1 < P < T,  was 
defined  near  the  beginning  of  Section  5.4.  Show  that  this  corresponds  to  a radix  sort 
based  on  a mixed-radix  number  system. 

2.  [M28]  The  text  illustrates  the  three-tape  polyphase  radix  sort  for  21  keys.  Gener- 
alize to  the  case  of  Fn  keys;  explain  what  keys  appear  on  what  tapes  at  the  end  of  each 
phase.  [Hint:  Consider  the  Fibonacci  number  system,  exercise  1.2.8-34.] 

3.  [M35]  Extend  the  results  of  exercise  2 to  the  polyphase  radix  sort  on  four  or  more 
tapes.  (See  exercise  5.4.2-10.) 

4.  [ M23 ] Prove  that  Ashenhurst’s  distribution  pattern  is  the  best  way  to  sort  10 
keys  on  four  tapes  without  reading  backwards,  in  the  sense  that  the  associated  tree  has 
minimum  external  path  length  over  all  strongly  4-fifo  trees.  (Thus,  it  is  essentially  the 
best  method  if  we  ignore  rewind  time.) 

5.  [15]  Draw  the  4-lifo  tree  corresponding  to  Mauchly’s  read-backwards  radix  sort 
for  10  keys. 

► 6.  [20]  A certain  file  contains  two-digit  keys  00,  01,  ...,  99.  After  performing 
Mauchly  s radix  sort  on  the  least  significant  digits,  we  can  repeat  the  same  scheme 
on  the  most  significant  digits,  interchanging  the  roles  of  tapes  T2  and  T4.  In  what 
order  will  the  keys  finally  appear  on  T2? 

7.  [21]  Does  the  duality  principle  apply  also  to  multireel  files? 

*5.4.8.  Two-Tape  Sorting 

Since  we  need  three  tapes  to  carry  out  a merge  process  without  excessive  tape 
motion,  it  is  interesting  to  speculate  about  how  we  could  perform  a reasonable 
external  sort  using  only  two  tapes. 

One  approach,  suggested  by  H.  B.  Demuth  in  1956,  is  sort  of  a combined 
replacement-selection  and  bubble  sort.  Assume  that  the  input  is  on  tape  Tl, 
and  begin  by  reading  P + 1 records  into  memory.  Now  output  the  record  whose 
key  is  smallest,  to  tape  T2,  and  replace  it  by  the  next  input  record.  Continue 
outputting  a record  whose  key  is  currently  the  smallest  in  memory,  maintaining 
a selection  tree  or  a priority  queue  of  P + 1 elements.  When  the  input  is  finally 
exhausted,  the  largest  P keys  of  the  file  will  be  present  in  memory;  output  them 
in  ascending  order.  Now  rewind  both  tapes  and  repeat  the  process  by  reading 
from  T2  and  writing  to  Tl;  each  such  pass  puts  at  least  P more  records  into 
their  proper  place.  A simple  test  can  be  built  into  the  program  that  determines 
when  the  entire  file  is  in  sort.  At  most  \(N  — 1)/ P]  passes  will  be  necessary. 


5.4.8 


TWO-TAPE  SORTING  349 


A few  moments’  reflection  shows  that  each  pass  of  this  procedure  is  essen- 
tially equivalent  to  P consecutive  passes  of  the  bubble  sort  (Algorithm  5.2.2B).  If 
an  element  has  P or  more  inversions,  it  will  be  smaller  than  everything  in  the  tree 
when  it  is  input,  so  it  will  be  output  immediately  — thereby  losing  P inversions. 
If  an  element  has  fewer  than  P inversions,  it  will  go  into  the  selection  tree  and 
will  be  output  before  all  greater  keys  — thereby  losing  all  its  inversions.  When 
P = 1,  this  is  exactly  what  happens  in  the  bubble  sort,  by  Theorem  5.2.21. 

The  total  number  of  passes  will  therefore  be  [7/P],  where  I is  the  maximum 
number  of  inversions  of  any  element.  By  the  theory  developed  in  Section  5.2.2, 
the  average  value  of  I is  N - yj ttN/2  + 2/3  + 0(\/s/N ) . 

If  the  file  is  not  too  much  larger  than  the  memory  size,  or  if  it  is  nearly  in 
order  to  begin  with,  this  order-P  bubble  sort  will  be  fairly  rapid;  in  fact,  such  a 
method  might  be  advantageous  even  when  extra  tape  units  are  available,  because 
scratch  tapes  must  be  mounted  by  a human  operator.  But  a two-tape  bubble 
sort  will  run  quite  slowly  on  fairly  long,  randomly  ordered  files,  since  its  average 
running  time  will  be  approximately  proportional  to  N2. 

Let  us  consider  how  this  method  might  be  implemented  for  the  100,000- 
record  example  of  Section  5.4.6.  We  need  to  choose  P intelligently,  in  order  to 
compensate  for  interblock  gaps  while  doing  simultaneous  reading,  writing,  and 
computing.  Since  the  example  assumes  that  each  record  is  100  characters  long 
and  that  100,000  characters  will  fit  into  memory,  we  can  make  room  for  two 
input  buffers  and  two  output  buffers  of  size  B by  setting 

100(P  + 1)  + 4P  = 100000.  (i) 

Using  the  notation  of  Section  5.4.6,  the  running  time  for  each  pass  will  be  about 

NCljt(1  + p),  u = {B  + i)/B.  (2) 

Since  the  number  of  passes  is  inversely  proportional  to  P,  we  want  to  choose  B to 
be  a multiple  of  100  that  minimizes  the  quantity  u/P.  Elementary  calculus  shows 
that  this  occurs  when  B is  approximately  a/249757  + 72  - 7,  so  we  take  B = 
3000,  P = 879.  Setting  N = 100000  in  the  formulas  above  shows  that  the  number 
of  passes  [7/P]  will  be  about  114,  and  the  total  estimated  running  time  will  be 
approximately  8.57  hours  (assuming  for  convenience  that  the  initial  input  and 
the  final  output  also  have  B — 3000).  This  represents  approximately  0.44  reelfuls 
of  data;  a full  reel  would  take  about  five  times  as  long.  Some  improvements  could 
be  made  if  the  algorithm  were  interrupted  periodically,  writing  the  records  with 
largest  keys  onto  an  auxiliary  tape  that  is  dismounted,  since  such  records  are 
merely  copied  back  and  forth  once  they  have  been  put  into  order. 

Application  of  quicksort.  Another  internal  sorting  method  that  traverses 
the  data  in  a nearly  sequential  manner  is  the  partition  exchange  or  quicksort 
procedure,  Algorithm  5.2.2Q.  Can  we  adapt  it  to  two  tapes?  [N.  B.  Yoash, 
CACM  8 (1965),  649.] 

It  is  not  difficult  to  see  how  this  can  indeed  be  done,  using  backward  reading. 
Assume  that  the  two  tapes  are  numbered  0 and  1,  and  imagine  that  the  file  is 
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laid  out  as  follows: 

Tape  0 


Beginning 
of  tape 
( “bottom” ) 


Tape  1 


Current 

position 

(“top”) 


Current 

position 

(“top”) 


Beginning 
of  tape 
( “bottom” ) 


Each  tape  serves  as  a stack;  putting  them  together  like  this  makes  it  possible  to 
view  the  file  as  a linear  list  in  which  we  can  move  the  current  position  left  or 
right  by  copying  from  one  stack  to  the  other.  The  following  recursive  subroutines 
define  a suitable  sorting  procedure: 

• S0RT00  [Sort  the  top  subfile  on  tape  0 and  return  it  to  tape  0]. 

If  the  subfile  fits  in  the  internal  memory,  sort  it  internally  and  return  it  to  tape. 
Otherwise  select  one  record  R from  the  subfile,  and  let  its  key  be  K.  Reading 
backwards  on  tape  0,  copy  all  records  whose  key  is  > K,  forming  a new  subfile 
on  the  top  of  tape  1.  Now  read  forward  on  tape  0,  copying  all  records  whose  key 
is  — K onto  tape  1.  Then  read  backwards  again,  copying  all  records  whose  key  is 
< K onto  tape  1.  Complete  the  sort  by  executing  S0RT10  on  the  < K keys,  then 
copying  the  = K keys  to  tape  0,  and  finally  executing  S0RT10  on  the  > K keys. 

• S0RT01  [Sort  the  top  subfile  on  tape  0 and  write  it  on  tape  1], 

Same  as  S0RT00,  but  the  final  “S0RT10”  is  changed  to  “S0RT11”  followed  by 
copying  the  < K keys  to  tape  1. 

• S0RT10  [Sort  the  top  subfile  on  tape  1 and  write  it  on  tape  0]. 

Same  as  S0RT01,  interchanging  0 with  1 and  < with  >. 

• S0RT11  [Sort  the  top  subfile  on  tape  1 and  return  it  to  tape  1]. 

Same  as  S0RT00,  interchanging  0 with  1 and  < with  >. 

The  recursive  nature  of  these  subroutines  can  be  handled  without  difficulty  by 
storing  appropriate  control  information  on  the  tapes. 

The  running  time  for  this  algorithm  can  be  estimated  as  follows,  if  we  assume 
that  the  data  are  in  random  order,  with  negligible  probability  of  equal  keys.  Let 
M be  the  number  of  records  that  fit  into  internal  memory.  Let  XN  be  the 
average  number  of  records  read  while  applying  S0RT00  or  S0RT11  to  a subfile  of 
N records,  when  N > M,  and  let  Yn  be  the  corresponding  quantity  for  S0RT01 
or  S0RT10.  Then  we  have 


Y _ JO, 

if  N < M: 

N | 3IV+  1 + £0<fc<iV(Efc  + yjv-i-fe), 

i(N>M- 

(3) 

v _ / °, 

if  N < M; 

N X 3N  + 2 + jf  So<fe<iv(yfe  + xN~i-k  + k), 

if  N > M. 

The  solution  to  these  recurrences  (see  exercise  2)  shows  that  the  total  amount  of 
tape  reading  during  the  external  partitioning  phases  will  be  6§./VTn./V  + O(N), 
on  the  average,  as  N — > oo.  We  also  know  from  Eq.  5.2.2— (25)  that  the  average 
number  of  internal  sort  phases  will  be  2(N  + 1 )/{M  + 2)  - 1. 
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If  we  apply  this  analysis  to  the  100,000-record  example  of  Section  5.4.6, 
using  25,000-character  buffers  and  assuming  that  the  sorting  time  is  2 nCuir 
for  a subfile  of  n < M = 1000  records,  we  obtain  an  average  sorting  time  of 
approximately  103  minutes  (including  the  final  rewind  as  in  Chart  A).  Thus  the 
quicksort  method  isn’t  bad,  on  the  average;  but  of  course  its  worst  case  turns 
out  to  be  even  more  awful  than  the  bubble  sort  discussed  above.  Randomization 
will  make  the  worst  case  extremely  unlikely. 

Radix  sorting.  The  radix  exchange  method  (Algorithm  5.2.2R)  can  be  adapted 
to  two-tape  sorting  in  a similar  way,  since  it  is  so  much  like  quicksort.  The  trick 
that  makes  both  of  these  methods  work  is  the  idea  of  reading  a file  more  than 
once,  something  we  never  did  in  our  previous  tape  algorithms. 

The  same  trick  can  be  used  to  do  a conventional  least-significant-digit-first 
radix  sort  on  two  tapes.  Given  the  input  data  on  Tl,  we  copy  all  records  onto 
T2  whose  key  ends  with  0 in  binary  notation;  then  after  rewinding  Tl  we  read  it 
again,  copying  the  records  whose  key  ends  with  1.  Now  both  tapes  are  rewound 
and  a similar  pair  of  passes  is  made,  interchanging  the  roles  of  Tl  and  T2,  and 
using  the  second  least  significant  binary  digit.  At  this  point  Tl  will  contain  all 
records  whose  keys  are  (. . . 00)2,  followed  by  those  whose  keys  are  (. . . 01)2,  then 
(. . . 10)2,  then  (. . . 11)2-  If  the  keys  are  b bits  long,  we  need  only  2b  passes  over 
the  file  in  order  to  complete  the  sort. 

Such  a radix  sort  could  be  applied  only  to  the  leading  b bits  of  the  keys,  for 
some  judiciously  chosen  number  6;  that  would  reduce  the  number  of  inversions 
by  a factor  of  about  2b,  if  the  keys  were  uniformly  distributed,  so  a few  passes  of 
the  P- way  bubble  sort  could  then  be  used  to  complete  the  job.  This  approach 
reads  tape  in  the  forward  direction  only. 

A novel  but  somewhat  more  complicated  approach  to  two-tape  distribution 
sorting  has  been  suggested  by  A.  I.  Nikitin  and  L.  I.  Sholmov  [ Kibernetika  2,  6 
(1966),  79-84],  Counts  are  made  of  the  number  of  keys  having  each  possible 
configuration  of  leading  bits,  and  artificial  keys  fti,  «2,  • • . , based  on  these 
counts  are  constructed  so  that  the  number  of  actual  keys  lying  between  k,  and 
Ki+i  is  between  predetermined  limits  P\  and  P2,  for  each  i.  Thus,  M lies  between 
\N/  P2I  and  \N/Pi\.  If  the  leading  bit  counts  do  not  give  sufficient  information 
to  determine  such  Ki,  «2,  • • • , KM,  one  or  more  further  passes  are  made  to  count 
the  frequency  of  less  significant  bit  patterns,  for  certain  configurations  of  most 
significant  bits.  After  the  table  of  artificial  keys  ki,k2 , ...,Km  has  been  con- 
structed, 2[lgM]  further  passes  will  suffice  to  complete  the  sort.  (This  method 
requires  memory  space  proportional  to  N,  so  it  can’t  be  used  for  external  sorting 
as  N — ► 00.  In  practice  we  would  not  use  the  technique  for  multireel  files,  so  M 
will  be  comparatively  small  and  the  table  of  artificial  keys  will  fit  comfortably 
in  memory.) 

Simulation  of  more  tapes.  F.  C.  Hennie  and  R.  E.  Stearns  have  devised  a 
general  technique  for  simulating  k tapes  on  only  two  tapes,  in  such  a way  that 
the  tape  motion  required  is  increased  by  a factor  of  only  0(log  L).  where  L is  the 
maximum  distance  to  be  traveled  on  any  one  tape  [JACM  13  (1966),  533-546]. 
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Fig.  86.  Layout  of  tape  T1  in  the  Hennie-S  teams  construction;  nonblank  zones  are 
shaded. 

Their  construction  can  be  simplified  slightly  in  the  case  of  sorting,  as  in  the 
following  method  suggested  by  R.  M.  Karp. 

We  shall  simulate  an  ordinary  four-tape  balanced  merge,  using  two  tapes  T1 
and  T2.  The  first  of  these,  Tl,  holds  the  simulated  tape  contents  in  a way  that 
may  be  diagrammed  as  in  Fig.  86;  we  imagine  that  the  data  is  written  in  four 
“tracks,”  one  for  each  simulated  tape.  (In  actual  fact  the  tape  doesn’t  have  such 
tracks;  blocks  1,  5,  9,  13,  ...  are  thought  of  as  Track  1,  blocks  2,  6,  10,  14,  ... 
as  Track  2,  etc.)  The  other  tape,  T2,  is  used  only  for  auxiliary  storage,  to  help 
move  things  around  on  Tl. 

The  blocks  of  each  track  are  divided  into  zones,  containing,  respectively, 
1,  2,  4,  8,  . . . , 2fc,  . . . blocks  per  zone.  Zone  k on  each  track  is  either  filled  with 
exactly  2k  blocks  of  data,  or  it  is  completely  blank.  In  Fig.  86,  for  example, 
Track  1 has  data  in  zones  1 and  3;  Track  2 in  zones  0,  1,  2;  Track  3 in  zones  0 
and  2;  Track  4 in  zone  1;  and  the  other  zones  are  blank. 

Suppose  that  we  are  merging  data  from  Tracks  1 and  2 to  Track  3.  The 
internal  computer  memory  contains  two  buffers  used  for  input  to  a two-way 
merge,  plus  a third  buffer  for  output.  When  the  input  buffer  for  Track  1 becomes 
empty,  we  can  refill  it  as  follows:  Find  the  first  nonempty  zone  on  Track  1,  say 
zone  k,  and  copy  its  first  block  into  the  input  buffer;  then  copy  the  other  2fc  - 1 
blocks  of  data  onto  T2,  and  move  them  to  zones  0,  1,  ... , k — 1 of  Track  1.  (Zones 
0,  1,  . . . , k — 1 are  now  full  and  zone  k is  blank.)  An  analogous  procedure  is  used 
to  refill  the  input  buffer  for  Track  2,  whenever  it  becomes  empty.  When  the 
output  buffer  is  ready  to  be  written  on  Track  3,  we  reverse  the  process,  scanning 
across  Tl  to  find  the  first  blank  zone  on  Track  3,  say  zone  k,  while  copying  the 
data  from  zones  0,  1,  ...,  k- 1 onto  T2.  The  data  on  T2,  augmented  by  the 
contents  of  the  output  buffer,  is  now  used  to  fill  zone  k of  Track  3. 

This  procedure  requires  the  ability  to  write  in  the  middle  of  tape  Tl,  without 
destroying  subsequent  information  on  that  tape.  As  in  the  case  of  read-forward 
oscillating  sort  (Section  5.4.5),  it  is  possible  to  do  this  reliably  if  suitable  pre- 
cautions are  taken. 

The  amount  of  tape  motion  required  to  bring  2l  - 1 blocks  of  Track  1 into 
memory  is  Ylo<k<i  2l~1~k  ■ c-2k  = cl 2l~1,  for  some  constant  c,  since  we  scan  up 
to  zone  k only  once  in  every  2k  steps.  Thus  each  merge  pass  requires  0{N  log  N) 
steps.  Since  there  are  O(logAZ')  passes  in  a balanced  merge,  the  total  time  to 
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sort  is  guaranteed  to  be  0(7V(log  N)2)  in  the  worst  case;  this  is  asymptotically 
much  better  than  the  worst  case  of  quicksort. 

But  this  method  wouldn’t  work  very  well  if  we  applied  it  to  the  100,000- 
record  example  of  Section  5.4.6,  since  the  information  specified  for  tape  T1  would 
overflow  the  contents  of  one  tape  reel.  Even  if  we  ignore  this  fact,  and  if  we  use 
optimistic  assumptions  about  read/write/compute  overlap  and  interblock  gap 
lengths,  etc.,  we  find  that  roughly  37  hours  would  be  required  to  complete  the 
sort!  So  this  method  is  purely  of  academic  interest;  the  constant  in  O (N (log  IV)2) 
is  much  too  high  to  be  satisfactory  when  N is  in  a practical  range. 

One-tape  sorting.  Could  we  live  with  only  one  tape?  It  is  not  difficult  to  see 
that  the  order-P  bubble  sort  described  above  could  be  converted  into  a one-tape 
sort,  but  the  result  would  be  ghastly. 

H.  B.  Demuth  [Ph.D.  thesis  (Stanford  University,  1956),  85]  observed  that  a 
computer  with  bounded  internal  memory  cannot  reduce  the  number  of  inversions 
of  a permutation  by  more  than  a bounded  amount  as  it  moves  a bounded  distance 
on  tape;  hence  every  one-tape  sorting  algorithm  must  take  at  least  N2d  units  of 
time  on  the  average,  for  some  positive  constant  d that  depends  on  the  computer 
configuration. 

R.  M.  Karp  has  pursued  this  topic  in  a very  interesting  way,  discovering  an 
essentially  optimum  way  to  sort  with  one  tape.  It  is  convenient  to  discuss  Karp’s 
algorithm  by  reformulating  the  problem  as  follows:  What  is  the  fastest  way 
to  transport  people  between  floors  using  a single  elevator?  [See  Combinatorial 
Algorithms,  edited  by  Randall  Rustin  (Algorithmics  Press,  1972),  17-21.] 

Consider  a building  with  n floors,  having  room  for  exactly  b people  on  each 
floor.  The  building  contains  no  doors,  windows,  or  stairs,  but  it  does  have  an 
elevator  that  can  stop  on  each  floor.  There  are  bn  people  in  the  building,  and 
exactly  b of  them  want  to  be  on  each  particular  floor.  The  elevator  holds  at  most 
to  people,  and  it  takes  one  unit  of  time  to  go  from  floor  i to  floor  i ± 1.  We 
wish  to  find  the  quickest  way  to  get  all  the  people  onto  the  proper  floors,  if  the 
elevator  is  required  to  start  and  finish  on  floor  1. 

The  connection  between  this  elevator  problem  and  one-tape  sorting  is  not 
hard  to  see:  The  people  are  the  records  and  the  building  is  the  tape.  The  floors 
are  individual  blocks  on  the  tape,  and  the  elevator  is  the  internal  computer 
memory.  A computer  program  has  more  flexibility  than  an  elevator  operator 
(it  can,  for  example,  duplicate  people,  or  temporarily  chop  them  into  two  parts 
on  different  floors,  etc.);  but  the  solution  below  solves  the  problem  in  the  fastest 
conceivable  time  without  doing  such  operations. 

The  following  two  auxiliary  tables  are  required  by  Karp’s  algorithm. 

uk,  1 < k <n:  Number  of  people  on  floors  < k whose  destination  is  > fc; 

(4; 

dk , 1 < k < n:  Number  of  people  on  floors  > k whose  destination  is  < k. 

When  the  elevator  is  empty,  we  always  have  Uk  = dfc+i  for  1 < k < n,  since  there 
are  b people  on  every  floor;  the  number  of  misfits  on  floors  {1, . . . , k}  must  equal 
the  corresponding  number  on  floors  {fc+1, . . . , n}.  By  definition,  un  = d\  = 0. 


354 


SORTING 


5.4.8 


Fig.  87.  Karp’s  elevator  algorithm. 

It  is  clear  that  the  elevator  must  make  at  least  \uk/rn\  trips  from  floor  k 
to  floor  k + 1,  for  1 < k < n,  since  only  m passengers  can  ascend  on  each  trip. 
Similarly  it  must  make  at  least  \dk/m\  trips  from  floor  k to  floor  k—  1.  Therefore 
the  elevator  must  necessarily  take  at  least 

n 

Y^{\uk/m]  + \dk/m])  (5) 

k= 1 

units  of  time  on  any  correct  schedule.  Karp  discovered  that  this  lower  bound 
can  actually  be  achieved,  when  ui, . . . , u„-i  are  nonzero. 

Theorem  K.  Ifuk  >0  for  1 < k < n,  there  is  an  elevator  schedule  that  delivers 
everyone  to  the  correct  floor  in  the  minimum  time  (5). 

Proof.  Assume  that  there  are  m extra  people  in  the  building;  they  start  in 
the  elevator  and  their  destination  floor  is  artificially  set  to  0.  The  elevator  can 
operate  according  to  the  following  algorithm,  starting  with  k (the  current  floor) 
equal  to  1: 

Kl.  [Move  up.]  From  among  the  b + m people  currently  in  the  elevator  or  on 
floor  k,  those  m with  the  highest  destinations  get  into  the  elevator,  and  the 
others  remain  on  floor  k. 

Let  there  be  u people  now  in  the  elevator  whose  destination  is  > k, 
and  d whose  destination  is  < k.  (It  will  turn  out  that  u = min(m,  uk); 
if  uk  < m we  may  therefore  be  transporting  some  people  away  from  their 
destination.  This  represents  their  sacrifice  to  the  common  good.)  Decrease 
uk  by  u,  increase  dk+ 1 by  d,  and  then  increase  k by  1. 

K2.  [Still  going  up?]  If  u > 0,  return  to  step  Kl. 

K3.  [Move  down.]  From  among  the  b + m people  currently  in  the  elevator  or  on 
floor  k,  those  m with  the  lowest  destinations  get  into  the  elevator,  and  the 
others  remain  on  floor  k. 

Let  there  be  u people  now  in  the  elevator  whose  destination  is  > k,  and 
d whose  destination  is  < k.  (It  will  always  turn  out  that  u — 0 and  d — m, 
but  the  algorithm  is  described  here  in  terms  of  general  u and  d in  order  to 
make  the  proof  a little  clearer.)  Decrease  dk  by  d.  increase  uk-i  by  u,  and 
then  decrease  fc  by  1. 
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Fig.  88.  An  optimum  way  to  rearrange  people  using  a small,  slow  elevator.  (People 
are  each  represented  by  the  number  of  their  destination  floor.) 


K4.  [Still  going  down?]  If  k > 1 and  u^-i  > 0,  return  to  step  K3.  If  k = 1 
and  u\  = 0,  terminate  the  algorithm  (everyone  has  arrived  safely  and  the 
m “extras”  are  back  in  the  elevator).  Otherwise  return  to  step  K2. 

Figure  88  shows  an  example  of  this  algorithm,  with  a nine-floor  building  and 
b = 2,  m — 3.  Note  that  one  of  the  6s  is  temporarily  transported  to  floor  7,  in 
spite  of  the  fact  that  the  elevator  travels  the  minimum  possible  distance.  The 
idea  of  testing  Uk-i  in  step  K4  is  the  crux  of  the  algorithm,  as  we  shall  see. 

To  verify  the  validity  of  this  algorithm,  we  note  that  steps  K1  and  K3  always 
keep  the  u and  d tables  (4)  up  to  date,  if  we  regard  the  people  in  the  elevator  as 
being  on  the  “current”  floor  k.  It  is  now  possible  to  prove  by  induction  that  the 
following  properties  hold  at  the  beginning  of  each  step: 

ui  = dj+i,  for  k <1  < n;  (6) 

ui  — di+ 1 — m,  for  1 < l < k;  (7) 

u;+ 1 =0,  if  ui  — 0 and  k < l < n.  (8) 

Furthermore,  at  the  beginning  of  step  Kl,  the  min  (v,k,  m)  people  with  highest 

destinations,  among  all  people  on  floors  < k with  destination  > k,  are  in  the 
elevator  or  on  floor  k.  At  the  beginning  of  step  K3,  the  min  (d^,  rn)  people  with 
lowest  destinations,  among  all  people  on  floors  > k with  destination  < k,  are  in 
the  elevator  or  on  floor  k. 

From  these  properties  it  follows  that  the  parenthesized  remarks  in  steps  Kl 
and  K3  are  valid.  Each  execution  of  step  Kl  therefore  decreases  [itfc/m]  by  1 
and  leaves  [rffc+i/m]  unchanged;  each  execution  of  K3  decreases  \dk/rn\  by  1 
and  leaves  \uk-\/rn\  unchanged.  The  algorithm  must  therefore  terminate  in  a 
finite  number  of  steps,  and  everybody  must  then  be  on  the  correct  floor  because 
of  (6)  and  (8).  | 
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When  uk  = 0 and  uk+\  > 0 we  have  a “disconnected”  situation;  the  elevator 
must  journey  up  to  floor  k + 1 in  order  to  rearrange  the  people  up  there,  even 
though  nobody  wants  to  move  from  floors  < k to  floors  > k + 1.  Without  loss 
of  generality,  we  may  assume  that  u„_!  > 0;  then  every  valid  elevator  schedule 
must  include  at  least 

2 ^ max(l,  fttfc/m])  ' (9) 

l<fc<n 

moves,  since  we  require  the  elevator  to  return  to  floor  1.  A schedule  achieving 
this  lower  bound  is  readily  constructed  (exercise  4). 

EXERCISES 

1.  [20]  The  order-P  bubble  sort  discussed  in  the  text  uses  only  forward  reading  and 
rewinding.  Can  the  algorithm  be  modified  to  take  advantage  of  backward  reading? 

2.  [ M26 ] Find  explicit  closed-form  solutions  for  the  numbers  XN,  YN  defined  in  (3). 
[Hint:  Study  the  solution  to  Eq.  5.2.2-(ig).] 

3.  [38]  Is  there  a two-tape  sorting  method,  based  only  on  comparisons  of  keys  (not 
digital  properties),  whose  tape  motion  is  0(N  log  N)  in  the  worst  case,  when  sorting 
N records?  [Quicksort  achieves  this  on  the  average,  but  not  in  the  worst  case,  and  the 
Hennie-Stearns  method  (Fig.  86)  achieves  0(N(logN)2).] 

4.  [ M23 ] In  the  elevator  problem,  suppose  there  are  indices  p and  q,  with  q > p + 2, 
Up  > 0,  uq  > 0,  and  up+i  = • • • = uq- 1 = 0.  Explain  how  to  construct  a schedule 
requiring  at  most  (9)  units  of  time. 

► 5.  [M23]  True  or  false:  After  step  K1  of  the  algorithm  in  Theorem  K,  nobody  on 
the  elevator  has  a lower  destination  than  any  person  on  floors  < k. 

6.  [M30]  (R.  M.  Karp.)  Generalize  the  elevator  problem  (Fig.  88)  to  the  case  that 
there  are  bj  passengers  initially  on  floor  j,  and  b'j  passengers  whose  destination  is  floor  j, 
for  1 < j < n.  Show  that  a schedule  exists  that  takes  2]T”I11  max(l,  \uk/m] , \dk+1/m] ) 
units  of  time,  never  allowing  more  than  max(h,,6j)  passengers  to  be  on  floor  j at  any 
one  time.  [Hint:  Introduce  fictitious  people,  if  necessary,  to  make  bj  = 6)  for  all  j] 

7.  [M40]  (R.  M.  Karp.)  Generalize  the  problem  of  exercise  6,  replacing  the  linear 
path  of  an  elevator  by  a network  of  roads  to  be  traveled  by  a bus,  given  that  the  network 
forms  any  free  tree.  The  bus  has  finite  capacity,  and  the  goal  is  to  transport  passengers 
to  their  destinations  in  such  a way  that  the  bus  travels  a minimum  distance. 

8.  [ M32 ] Let  b = 1 in  the  elevator  problem  treated  in  the  text.  How  many  permu- 
tations of  the  n people  on  the  n floors  will  make  uk  < 1 for  1 < k < n in  (4)?  [For 
example,  314592687  is  such  a permutation.] 

► 9.  [ M25 ] Find  a significant  connection  between  the  “cocktail-shaker  sort”  described 
in  Section  5.2.2,  Fig.  16,  and  the  numbers  u\,  U2 , . . . , un  of  (4)  in  the  case  6=1. 

10.  [20]  How  would  you  sort  a multireel  file  with  only  two  tapes? 

*5.4.9.  Disks  and  Drums 

So  far  we  have  considered  tapes  as  the  vehicles  for  external  sorting,  but  more 
flexible  types  of  mass  storage  devices  are  generally  available.  Although  such 
bulk  memory  or  “direct-access  storage”  units  come  in  many  different  forms, 
they  may  be  roughly  characterized  by  the  following  properties: 
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i)  Any  specified  part  of  the  stored  information  can  be  accessed  quickly. 

ii)  Blocks  of  consecutive  words  can  be  transmitted  rapidly  between  the  internal 
and  external  memory. 

Magnetic  tape  satisfies  (ii)  but  not  (i),  because  it  takes  a long  time  to  get  from 
one  end  of  a tape  to  the  other. 

Every  external  memory  unit  has  idiosyncrasies  that  ought  to  be  studied 
carefully  before  major  programs  are  written  for  it;  but  technology  changes  so 
rapidly,  it  is  impossible  to  give  a complete  discussion  here  of  all  the  available 
varieties  of  hardware.  Therefore  we  shall  consider  only  some  typical  memory 
devices  that  illustrate  useful  approaches  to  the  sorting  problem. 

One  of  the  most  common  types  of  external  memories  satisfying  (i)  and  (ii)  is 
a disk  device  (see  Fig.  89).  Data  is  kept  on  a number  of  rapidly  rotating  circular 
disks,  covered  with  magnetic  material;  a comb-like  access  arm,  containing  one 
or  more  “read/write  heads”  for  each  disk  surface,  is  used  to  store  and  retrieve 
the  information.  Each  individual  surface  is  divided  into  concentric  rings  called 
tracks , so  that  an  entire  track  of  data  passes  a read/write  head  every  time  the 
disk  completes  one  revolution.  The  access  arm  can  move  in  and  out,  shifting 
the  read/write  heads  from  track  to  track;  but  this  motion  takes  time.  A set 
of  tracks  that  can  be  read  or  written  without  repositioning  the  access  arm  is 
called  a cylinder.  For  example,  Fig.  89  illustrates  a disk  unit  that  has  just  one 
read/write  head  per  surface;  the  light  gray  circles  show  one  of  the  cylinders, 
consisting  of  all  tracks  currently  being  scanned  by  the  read/write  heads. 


To  fix  the  ideas,  let  us  consider  hypothetical  MIXTEC  disk  units,  for  which 

1 track  = 5000  characters 
1 cylinder  = 20  tracks 

1 disk  unit  = 200  cylinders 

Such  a disk  unit  contains  20  million  characters,  slightly  less  than  the  amount 
of  data  that  can  be  stored  on  a single  MIXT  magnetic  tape.  On  some  machines, 
tracks  near  the  center  have  fewer  characters  than  tracks  near  the  rim;  this  tends 
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to  make  the  programming  much  more  complicated,  and  MIXTEC  fortunately 
avoids  such  problems.  (See  Section  5.4.6  for  a discussion  of  MIXT  tapes.  As 
in  that  section,  we  are  studying  classical  techniques  by  considering  machine 
characteristics  that  were  typical  of  the  early  1970s;  modern  disks  are  much  bigger 
and  faster.) 

The  amount  of  time  required  to  read  or  write  on  a disk  device  is  essentially 
the  sum  of  three  quantities: 

• seek  time  (the  time  to  move  the  access  arm  to  the  proper  cylinder); 

• latency  time  (rotational  delay  until  the  read/write  head  reaches  the  right  spot); 

• transmission  time  (rotational  delay  while  the  data  passes  the  read/write  head). 

On  MIXTEC  devices  the  seek  time  required  to  go  from  cylinder  i to  cylinder  j is 
25+  \\i~j\  milliseconds.  If  i and  j are  randomly  selected  integers  between  1 and 
200,  the  average  value  of  \i  - j | is  2(2°1)/2002  ss  66.7,  so  the  average  seek  time  is 
about  60  ms.  MIXTEC  disks  rotate  once  every  25  ms,  so  the  latency  time  averages 
about  12.5  ms.  The  transmission  time  for  n characters  is  (n/5000)  x 25  ms  = 
5nps.  (This  is  about  3|  times  as  fast  as  the  transmission  rate  of  the  MIXT  tapes 
that  were  used  in  the  examples  of  Section  5.4.6.) 

Thus  the  main  differences  between  MIXTEC  disks  and  MIXT  tapes  are  these: 

a)  Tapes  can  only  be  accessed  sequentially. 

b)  Individual  disk  operations  tend  to  require  significantly  more  overhead  (seek 
time  + latency  time  compared  to  stop/start  time). 

c)  The  disk  transmission  rate  is  faster. 

By  using  clever  merge  patterns  on  tape,  we  were  able  to  compensate  somewhat 
for  disadvantage  (a).  Our  goal  now  is  to  think  of  some  clever  algorithms  for  disk 
sorting  that  will  compensate  for  disadvantage  (b). 

Overcoming  latency  time.  Let  us  consider  first  the  problem  of  minimising 
the  delays  caused  by  the  fact  that  the  disks  aren’t  always  positioned  properly 
when  we  want  to  start  an  I/O  command.  We  can’t  make  the  disk  spin  faster, 
but  we  can  still  apply  some  tricks  that  reduce  or  even  eliminate  all  of  the  latency 
time.  The  addition  of  more  access  arms  would  obviously  help,  but  that  would 
be  an  expensive  hardware  modification.  Here  are  some  software  ideas: 

• If  we  read  or  write  several  tracks  of  a cylinder  at  a time,  we  avoid  the 
latency  time  (and  the  seek  time)  on  all  tracks  but  the  first.  In  general  it  is  often 
possible  to  synchronize  the  computing  time  with  the  disk  movement  in  such  a 
way  that  a sequence  of  input/output  instructions  can  be  carried  out  without 
latency  delays. 

• Consider  the  problem  of  reading  half  a track  of  data  (Fig.  90):  If  the  read 
command  begins  when  the  heads  are  at  axis  A,  there  is  no  latency  delay,  and  the 
total  time  for  reading  is  just  the  transmission  time,  \ x 25  ms.  If  the  command 
begins  with  the  heads  at  B,  we  need  | of  a revolution  for  latency  and  | for 
transmission,  totalling  ^ x 25  ms.  The  most  interesting  case  occurs  when  the 
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Fig.  90.  Analysis  of  the  latency  time  when  reading  half  of  a track. 

heads  are  initially  at  C:  With  proper  hardware  and  software  we  need  not  waste 
| of  a revolution  for  latency  delay.  Reading  can  begin  immediately,  into  the 
second  half  of  the  input  buffer;  then  after  a \ x 25  ms  pause,  reading  can  resume 
into  the  first  half  of  the  buffer,  so  that  the  instruction  is  completed  when  axis  C 
is  reached  again.  In  a similar  manner,  we  can  ensure  that  the  total  latency  plus 
transmission  time  will  never  exceed  the  time  for  one  revolution,  regardless  of  the 
initial  position  of  the  disk.  The  average  amount  of  latency  delay  is  reduced  by 
this  scheme  from  half  a revolution  to  | (I  — x2)  of  a revolution,  if  we  are  reading 
or  writing  a given  fraction  a;  of  a track,  for  0 < x < 1.  When  an  entire  track  is 
being  read  or  written  (x  = 1),  this  technique  eliminates  all  the  latency  time. 

Drums:  The  no-seek  case.  Some  external  memory  units,  traditionally  called 
drum  memories,  eliminate  the  seek  time  by  having  one  read/write  head  for  every 
track.  If  the  technique  of  Fig.  90  is  employed  on  such  devices,  both  seek  time 
and  latency  time  reduce  to  zero,  provided  that  we  always  read  or  write  a track 
at  a time;  this  is  the  ideal  situation  in  which  transmission  time  is  the  only 
limiting  factor. 

Let  us  consider  again  the  example  application  of  Section  5.4.6,  sorting 
100,000  records  of  100  characters  each,  with  a 100,000-character  internal  memory. 
The  total  amount  of  data  to  be  sorted  fills  half  of  a MIXTEC  disk.  It  is  usually 
impossible  to  read  and  write  simultaneously  on  a single  disk  unit;  we  shall  assume 
that  two  disks  are  available,  so  that  reading  and  writing  can  overlap  each  other. 
For  the  moment  we  shall  assume,  in  fact,  that  the  disks  are  actually  drums, 
containing  4000  tracks  of  5000  characters  each,  with  no  seek  time  required. 

What  sorting  algorithm  should  be  used?  The  method  of  merging  is  a fairly 
natural  choice;  other  methods  of  internal  sorting  do  not  lend  themselves  so  well 
to  a disk  implementation,  except  for  the  radix  techniques  of  Section  5.2.5.  The 
considerations  of  Section  5.4.7  show  that  radix  sorting  is  usually  inferior  to 
merging  for  general-purpose  applications,  because  the  duality  theorem  of  that 
section  applies  to  disks  as  well  as  to  tapes.  Radix  sorting  does  have  a strong 
advantage,  however,  when  the  keys  are  uniformly  distributed  and  many  disks 
can  be  used  in  parallel,  because  an  initial  distribution  by  the  most  significant 
digits  of  the  keys  will  divide  the  work  up  into  independent  subproblems  that 
need  no  further  communication.  (See,  for  example,  R.  C.  Agarwal,  SIGMOD 
Record  25,2  (June  1996),  240-246.) 
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We  will  concentrate  on  merge  sorting  in  the  following  discussion.  To  begin 
a merge  sort  for  the  stated  problem  we  can  use  replacement  selection,  with  two 
5000-character  input  buffers  and  two  5000-character  output  buffers.  In  fact,  it  is 
possible  to  reduce  this  to  three  5000-character  buffers,  if  records  in  the  current 
input  buffer  are  replaced  by  records  that  come  off  the  selection  tree.  That  leaves 
85,000  characters  (850  records)  for  a selection  tree,  so  one  pass  over  our  example 
data  will  form  about  60  initial  runs.  (See  Eq.  5.4.6-(3).)  This  pass  takes  only 
about  50  seconds,  if  we  assume  that  the  internal  processing  time  is  fast  enough 
to  keep  up  with  the  input/output  rate,  with  one  record  moving  to  the  output 
buffer  every  500  microseconds.  If  the  input  to  be  sorted  appeared  on  a MIXT 
tape,  instead  of  a drum,  this  pass  would  be  slower,  governed  by  the  tape  speed. 

With  two  drums  and  full-track  reading/writing,  it  is  not  hard  to  see  that 
the  total  transmission  time  for  P- way  merging  is  minimized  if  we  let  P be  as 
large  as  possible.  Unfortunately  we  can’t  simply  do  a 60-way  merge  on  all  of  the 
initial  runs,  since  there  isn’t  room  for  60  buffers  in  memory.  (A  buffer  of  fewer 
than  5000  characters  would  introduce  unwanted  latency  time.  Remember  that 
we  are  still  pretending  to  be  living  in  the  1970s,  when  internal  memory  space  was 
significantly  limited.)  If  we  do  P- way  merges,  passing  all  the  data  from  one  drum 
to  the  other  so  that  reading  and  writing  are  overlapped,  the  number  of  merge 
passes  is  [logP60],  so  we  may  complete  the  job  in  two  passes  if  8 < P < 59. 
The  smallest  such  P reduces  the  amount  of  internal  computing,  so  we  choose 
P — 8;  if  65  initial  runs  had  been  formed  we  would  take  P - 9.  If  82  or  more 
initial  runs  had  been  formed,  we  could  take  P = 10,  but  since  there  is  room 
for  only  18  input  buffers  and  2 output  buffers  there  would  be  a possibility  of 
hangup  during  the  merge  (see  Algorithm  5.4.6F);  it  may  be  better  in  such  a case 
to  do  two  partial  passes  over  a small  portion  of  the  data,  reducing  the  number 
of  initial  runs  to  81  or  less. 

Under  our  assumptions,  both  of  the  merging  passes  will  take  about  50 
seconds,  so  the  entire  sort  in  this  ideal  situation  will  be  completed  in  just  2.5 
minutes  (plus  a few  seconds  for  bookkeeping,  initialization,  etc.).  This  is  six 
times  faster  than  the  best  six-tape  sort  considered  in  Section  5.4.6;  the  reasons 
for  this  speedup  are  the  improved  external/internal  transmission  rate  (3.5  times 
faster),  the  higher  order  of  merge  (we  can’t  do  an  eight-way  tape  merge  unless  we 
have  nine  or  more  tapes),  and  the  fact  that  the  output  was  left  on  disk  (no  final 
rewind,  etc.,  was  necessary).  If  the  initial  input  and  sorted  output  were  required 
to  be  on  MIXT  tapes,  with  the  drums  used  for  merging  only,  the  corresponding 
sorting  time  would  have  been  about  8.2  minutes. 

If  only  one  drum  were  available  instead  of  two,  the  input-output  time  would 
take  twice  as  long,  since  reading  and  writing  must  be  done  separately.  (In  fact, 
the  input-output  operations  might  take  three  times  as  long,  since  we  would  be 
overwriting  the  initial  input  data;  in  such  a case  it  is  prudent  to  follow  each  write 
by  a read-back  check  operation,  lest  some  of  the  input  data  be  irretrievably 
lost,  if  the  hardware  does  not  provide  automatic  verification  of  written  informa- 
tion.) But  some  of  this  excess  time  can  be  recovered  because  we  can  use  partial 
pass  methods  that  process  some  data  records  more  often  than  others.  The  two- 
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drum  case  requires  all  data  to  be  processed  an  even  number  or  an  odd  number 
of  times,  but  the  one-drum  case  can  use  more  general  merge  patterns. 

We  observed  in  Section  5.4.4  that  merge  patterns  can  be  represented  by  trees, 
and  that  the  transmission  time  corresponding  to  a merge  pattern  is  proportional 
to  the  external  path  length  of  its  tree.  Only  certain  trees  (T-lifo  or  strongly 
T-fifo)  could  be  used  as  efficient  tape  merging  patterns,  because  some  runs  get 
buried  in  the  middle  of  a tape  as  the  merging  proceeds.  But  on  disks  or  drums, 
all  trees  define  usable  merge  patterns  if  the  degrees  of  their  internal  nodes  are 
not  too  large  for  the  available  internal  memory  size. 

Therefore  we  can  minimize  transmission  time  by  choosing  a tree  with  mini- 
mum external  path  length,  such  as  a complete  P- ary  tree  where  P is  as  large  as 
possible.  By  Eq.  5.4.4-(g),  the  external  path  length  of  such  a tree  is  equal  to 

qs-  L(P«-S)/(P-1)J,  ^[logpS],  (i) 

if  there  are  5 external  nodes  (leaves). 

It  is  particularly  easy  to  design  an  algorithm  that  merges  according  to 
the  complete  P- ary  tree  pattern.  See,  for  example,  Fig.  91,  which  shows  the 
case  P = 3,  S — 6.  First  we  add  dummy  runs,  if  necessary,  to  make  5=1 
(modulo  P — 1);  then  we  combine  runs  according  to  a first-in-first-out  discipline, 
at  every  stage  merging  the  P oldest  runs  at  the  front  of  the  queue  into  a single 
run  that  is  placed  at  the  rear. 


The  complete  P- ary  tree  gives  an  optimum  pattern  if  all  of  the  initial  runs 
are  the  same  length,  but  we  can  often  do  better  if  some  runs  are  longer  than 
others.  An  optimum  pattern  for  this  general  situation  can  be  constructed  without 
difficulty  by  using  Huffman’s  method  (exercise  2.3.4.5-10),  which  may  be  stated 
in  merging  language  as  follows:  “First  add  (1  — 5)  mod  (P  - 1)  dummy  runs  of 
length  0.  Then  repeatedly  merge  together  the  P shortest  existing  runs  until  only 
one  run  is  left.”  When  all  initial  runs  have  the  same  length  this  method  reduces 
to  the  FIFO  discipline  described  above. 

In  our  100,000-record  example  we  can  do  nine-way  merging,  since  18  input 
buffers  and  two  output  buffers  will  fit  in  memory  and  Algorithm  5.4.6F  will 
overlap  all  compute  time.  The  complete  9-ary  tree  with  60  leaves  corresponds 
to  a merging  pattern  with  1 1|  passes,  if  all  initial  runs  have  the  same  length. 
The  total  sorting  time  with  one  drum,  using  read-back  check  after  every  write, 
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therefore  comes  to  about  7.4  minutes.  A higher  value  of  P may  reduce  this 
running  time  slightly;  but  the  situation  is  complicated  because  “reading  hangup” 
might  occur  when  the  buffers  become  too  full  or  too  empty. 

The  influence  of  seek  time.  Our  discussion  shows  that  it  is  relatively  easy  to 
construct  optimum  merging  patterns  for  drums,  because  seek  time  and  latency 
time  can  be  essentially  nonexistent.  But  when  disks  are  used  with  small  buffers 
we  often  spend  more  time  seeking  information  than  reading  it,  so  the  seek  time 
has  a considerable  influence  on  the  sorting  strategy.  Decreasing  the  order  of 
merge,  P , makes  it  possible  to  use  larger  buffers,  so  fewer  seeks  are  required; 
this  often  compensates  for  the  extra  transmission  time  demanded  by  the  smaller 
value  of  P. 

Seek  time  depends  on  the  distance  traveled  by  the  access  arm,  and  we  could 
try  to  arrange  things  so  that  this  distance  is  minimized.  For  example,  it  may  be 
wise  to  sort  the  records  within  cylinders  first.  However,  large-scale  merging 
requires  a good  deal  of  jumping  around  between  cylinders  (see  exercise  2). 
Furthermore,  the  multiprogramming  capability  of  modern  operating  systems 
means  that  users  tend  to  lose  control  over  the  position  of  disk  access  arms. 
We  are  often  justified,  therefore,  in  assuming  that  each  disk  command  involves 
a “random”  seek. 

Our  goal  is  to  discover  a merge  pattern  that  achieves  the  best  balance 
between  seek  time  and  transmission  time.  For  this  purpose  we  need  some  way 
to  estimate  the  goodness  of  any  particular  tree  with  respect  to  a particular 
hardware  configuration.  Consider,  for  example,  the  tree  in  Fig.  92;  we  want  to 
estimate  how  long  it  will  take  to  carry  out  the  corresponding  merge,  so  that  we 
can  compare  this  tree  to  other  trees. 

In  the  following  discussion  we  shall  make  some  simple  assumptions  about 
disk  merging,  in  order  to  illustrate  some  of  the  general  ideas.  Let  us  suppose  that 
(i)  it  takes  72.5  + 0.005n  milliseconds  to  read  or  write  n characters;  (ii)  100,000 
characters  of  internal  memory  are  available  for  working  storage;  (iii)  an  average 
of  0.004  milliseconds  of  computation  time  are  required  to  transmit  each  character 
from  input  to  output;  (iv)  there  is  to  be  no  overlap  between  reading,  writing, 
or  computing;  and  (v)  the  buffer  size  used  on  output  need  not  be  the  same  as 
the  buffer  size  used  to  read  the  data  on  the  following  pass.  An  analysis  of  the 
sorting  problem  under  these  simple  assumptions  will  give  us  some  insights  when 
we  turn  to  more  complicated  situations. 

If  we  do  a P- way  merge,  we  can  divide  the  internal  working  storage  into  P+1 
buffer  areas,  P for  input  and  one  for  output,  with  B = 100000/(P+1)  characters 
per  buffer.  Suppose  the  files  being  merged  contain  a total  of  L characters;  then 
we  will  do  approximately  L/B  output  operations  and  about  the  same  number 
of  input  operations,  so  the  total  merging  time  under  our  assumptions  will  be 
approximately 

2 (72'5^  + °-005L)  + °-004i  = (0.00145P  + 0.01545)L  (2) 


milliseconds. 
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Fig.  92.  A tree  whose  external  path  length  is  16  and  whose  degree  path  length  is  52. 


In  other  words,  a P- way  merge  of  L characters  takes  about  ( aP  + (3)L  units 
of  time,  for  some  constants  a and  /3  depending  on  the  seek  time,  latency  time, 
compute  time,  and  memory  size.  This  formula  leads  to  an  interesting  way  to 
construct  good  merge  patterns  for  disks.  Consider  Fig.  92,  for  example,  and 
assume  that  all  initial  runs  (represented  by  square  leaf  nodes)  have  length  L0. 
Then  the  merges  at  nodes  9 and  10  each  take  (2a  + /3)(2 L0)  units  of  time,  the 
merge  at  node  11  takes  (3a  + /3)(4L0),  and  the  final  merge  at  node  12  takes 
(4a  + /3)(8Lo).  The  total  merging  time  therefore  comes  to  (52a  + 16/3 )L0  units. 
The  coefficient  “16”  here  is  well-known  to  us,  it  is  simply  the  external  path 
length  of  the  tree.  The  coefficient  “52”  of  a is,  however,  a new  concept,  which 
we  may  call  the  degree  path  length  of  the  tree;  it  is  the  sum,  taken  over  all  leaf 
nodes,  of  the  internal-node  degrees  on  the  path  from  the  leaf  to  the  root.  For 
example,  in  Fig.  92  the  degree  path  length  is 

(2  + 4)  + (2  + 4)  + (3  + 4)  + (2  + 3 + 4)  + (2  + 3 + 4)  + (3  + 4)  + (4)  + (4) 

= 52. 

If  T is  any  tree,  let  D(T)  and  E(T)  denote  its  degree  path  length  and  its 
external  path  length,  respectively.  Our  analysis  may  be  summarized  as  follows: 

Theorem  H.  If  the  time  required  to  do  a P-way  merge  on  L characters  has 
the  form  (aP  + /3)L,  and  if  there  are  S equal-length  runs  to  be  merged,  the  best 
merge  pattern  corresponds  to  a tree  T for  which  aD(T)  +/3E(T)  is  a minimum, 
over  all  trees  having  S leaves.  | 

(This  theorem  was  implicitly  contained  in  an  unpublished  paper  that  George  U. 
Hubbard  presented  at  the  ACM  National  Conference  in  1963.) 

Let  a and  (3  be  fixed  constants;  we  shall  say  a tree  is  optimal  if  it  has  the 
minimum  value  of  aD{T)  + f3E(T)  over  all  trees,  T , with  the  same  number  of 
leaves.  It  is  not  difficult  to  see  that  all  subtrees  of  an  optimal  tree  are  optimal, 
and  therefore  we  can  construct  optimal  trees  with  n leaves  by  piecing  together 
optimal  trees  with  < n leaves. 
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Theorem  K.  Let  the  sequence  of  numbers  Am(n)  be  defined  for  1 < m < n by 
the  rules 


Ai(l)  = 0; 

(3) 

Am{n)=  min  (A1(k)  + Am_i(n  - k)), 

1 <k<n/m 

for  2 < m < n; 

(4) 

Ai{n)  = min  (amn  + f3n  + Am(n)), 

2 <m<n  ' 

for  n > 2. 

(5) 

Then  Ai(n)  is  the  minimum  value  of  aD(T ) + fiE(T).  over  all  trees  T with 
n leaves. 

Proof.  Equation  (4)  implies  that  Am(n)  is  the  minimum  value  of  Ai(rii)  H 1- 

Ai(nm)  taken  over  all  positive  integers  n1; . . . ,nm  such  that  ni  H 1 - nm  - n. 

The  result  now  follows  by  induction  on  n.  | 

The  recurrence  relations  (3),  (4),  (5)  can  also  be  used  to  construct  the 
optimal  trees  themselves:  Let  km(n)  be  a value  for  which  the  minimum  occurs 
in  the  definition  of  Am(n).  Then  we  can  construct  an  optimal  tree  with  n leaves 
by  joining  m = fci(n)  subtrees  at  the  root;  the  subtrees  are  optimal  trees  with 
kmipf  ^m(^)) j 2 (p  ^m(^)  km— \{n  ^m(R)))j  •••  leaves, 

respectively. 

For  example,  Table  1 illustrates  this  construction  when  a = /3  = 1.  A com- 
pact specification  of  the  corresponding  optimal  trees  appears  at  the  right  of  the 
table;  the  entry  “4:9:9”  when  n = 22  means,  for  example,  that  an  optimal  tree 
722  with  22  leaves  may  be  obtained  by  combining  Ti,%,  and  7g  (see  Fig.  93). 
Optimal  trees  are  not  unique;  for  instance,  5:8:9  would  be  just  as  good  as  4:9:9. 


Fig.  93.  An  optimum  way  to  merge  22  initial  runs  of  equal  length,  when  a = ft  in 
Theorem  H.  This  pattern  minimizes  the  seek  time,  under  the  assumptions  leading  to 
Eq.  (2)  in  the  text. 

Our  derivation  of  (2)  shows  that  the  relation  a < (3  will  hold  whenever 
P + 1 equal  buffer  areas  are  used.  The  limiting  case  a = /?,  shown  in  Table  1 
and  Fig.  93,  occurs  when  the  seek  time  itself  is  to  be  minimized  without  regard 
to  transmission  time. 

Returning  to  our  original  application,  we  still  haven’t  considered  how  to 
get  the  initial  runs  in  the  first  place;  without  read/ write/compute  overlap, 
replacement  selection  loses  some  of  its  advantages.  Perhaps  we  should  fill  the 
entire  internal  memory,  sort  it,  and  output  the  results;  such  input  and  output 
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Table  1 

OPTIMAL  TREE  CHARACTERISTICS  Am(n),  km(n)  WHEN  a = fj  = 1 


m 


n 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12  ' 

Tree 

n 

1 

0,0 

— 

1 

2 

6,2 

0,1 

1:1 

2 

3 

12,3 

6,1 

0,1 

1:1:1 

3 

4 

20,4 

12,1 

6,1 

0,1 

1:1:1: 1 

4 

5 

30,5 

18,2 

12,1 

6,1 

0,1 

1:1:1:1:1 

5 

6 

42,2 

24,3 

18,1 

12,1 

6,1 

0,1 

3:3 

6 

7 

52,3 

32,3 

24,1 

18,1 

12,1 

6,1 

0,1 

1:3:3 

7 

8 

62,3 

40,4 

30,2 

24,1 

18,1 

12,1 

6,1 

0,1 

2:3:3 

8 

9 

72,3 

50,4 

36,3 

30,1 

24,1 

18,1 

12,1 

6,1 

0,1 

3:3:3 

9 

10 

84,3 

60,5 

44,3 

36,1 

30,1 

24,1 

18,1 

12,1 

6,1 

0,1 

3:3:4 

10 

11 

96,3 

72,4 

52,3 

42,2 

36,1 

30,1 

24,1 

18,1 

12,1 

6,1 

0,1 

3:4:4 

11 

12 

108,3 

82,4 

60,4 

48,3 

42,1 

36,1 

30,1 

24,1 

18,1 

12,1 

6,1 

0,1 

4:4:4 

12 

13 

121,4 

92,4 

70,4 

56,3 

48,1 

42,1 

36,1 

30,1 

24,1 

18,1 

12,1 

6,1 

3:3:3:4 

13 

14 

134,4 

102,5 

80,4 

64,3 

54,2 

48,1 

42,1 

36,1 

30,1 

24,1 

18,1 

12,1 

3:3:4:4 

14 

15 

147,4 

114,5 

90,4 

72,3 

60,3 

54,1 

48,1 

42,1 

36,1 

30,1 

24,1 

18,1 

3:4:4:4 

15 

16 

160,4 

124,7 

102,4 

80,4 

68,3 

60,1 

54,1 

48,1 

42,1 

36,1 

30,1 

24,1 

4:4:4:4 

16 

17 

175,4 

134,8 

112,4 

90,4 

76,3 

66,2 

60,1 

54,1 

48,1 

42,1 

36,1 

30,1 

4:4:4:5 

17 

18 

190,4 

144,9 

122,4 

100,4 

84,3 

72,3 

66,1 

60,1 

54,1 

48,1 

42,1 

36,1 

4:4:5:5 

18 

19 

205,4 

156,9 

132,5 

110,4 

92,3 

80,3 

72,1 

66,1 

60,1 

54,1 

48,1 

42,1 

4:5:5:5 

19 

20 

220,4 

168,9 

144,4 

120,5 

100,4 

88,3 

78,2 

72,1 

66,1 

60,1 

54,1 

48,1 

5:5:5:5 

20 

21 

236,5 

180,9 

154,4 

132,4 

110,4 

96,3 

84,3 

78,1 

72,1 

66,1 

60,1 

54,1 

4:4:4:4:5 

21 

22 

252,3 

192,10 

164,4 

142,4 

120,4 

104,3 

92,3 

84,1 

78,1 

72,1 

66,1 

60,1 

4:9:9 

22 

23 

266,3 

204,11 

174,5 

152,4 

130,4 

112,3 

100,3 

90,2 

84,1 

78,1 

72,1 

66,1 

5:9:9 

23 

24 

282,3 

216,12 

186,5 

162,5 

140,4 

120,4 

108,3 

96,3 

90,1 

84,1 

78,1 

72,1 

5:9:10 

24 

25 

296,3 

229,12 

196,7 

174,4 

150,5 

130,4 

116,3 

104,3 

96,1 

90,1 

84,1 

78,1 

7:9:9 

25 

operations  can  each  be  done  with  one  seek.  Or  perhaps  we  are  better  off  using, 
say,  20  percent  of  the  memory  as  a combination  input/output  buffer,  and  doing 
replacement  selection.  This  requires  five  times  as  many  seeks  (an  extra  60 
seconds  or  so!),  but  it  reduces  the  number  of  initial  runs  from  100  to  64;  the  reduc- 
tion would  be  more  dramatic  if  the  input  file  were  pretty  much  in  order  already. 

If  we  decide  not  to  use  replacement  selection,  the  optimum  tree  for  S = 100, 
a = 0.00145,  p = 0.01545  [see  (2)]  turns  out  to  be  rather  prosaic:  It  is  simply  a 
10-way  merge,  completed  in  two  passes  over  the  data.  Allowing  30  seconds  for 
internal  sorting  (100  quicksorts,  say),  the  initial  distribution  pass  takes  about 
2.5  minutes,  and  the  merge  passes  each  take  almost  5 minutes,  for  a total  of 
12.4  minutes.  If  we  decide  to  use  replacement  selection,  the  optimal  tree  for 
S = 64  turns  out  to  be  equally  uninteresting  (two  8- way  merge  passes);  the  initial 
distribution  pass  takes  about  3.5  minutes,  the  merge  passes  each  take  about  4.5 
minutes,  and  the  estimated  total  time  comes  to  12.6  minutes.  Remember  that 
both  of  these  methods  give  up  virtually  all  read/write/compute  overlap  in  order 
to  have  larger  buffers,  reducing  seek  time.  None  of  these  estimated  times  includes 
the  time  that  might  be  necessary  for  read-back  check  operations. 

In  practice  the  final  merge  pass  tends  to  be  quite  different  from  the  others; 
for  example,  the  output  is  often  expanded  and/or  written  onto  tape.  In  such 
cases  the  tree  pattern  should  be  chosen  using  a different  optimality  criterion  at 
the  root. 
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A closer  look  at  optimal  trees.  It  is  interesting  to  examine  the  extreme  case 
/3  = 0 in  Theorems  H and  K,  even  though  practical  situations  usually  lead  to 
parameters  with  0 < a < /3.  What  tree  with  n leaves  has  the  smallest  possible 
degree  path  length?  Curiously  it  turns  out  that  three-way  merging  is  best. 


Theorem  L.  The  degree  path  length  of  a tree  with  n leaves  is  never  less  than 

f(n)=  f3<?n  + 2(n-3*),  if  2 • 3*-1  < n < 3"; 

\ 3qn  + 4(n  — 39),  if  3«  < n < 2 • 3«.  [ ) 

Ternary  trees  Tn  defined  by  the  rules 


T"n' 

have  the  minimum  degree  path  length. 


Proof.  It  is  important  to  observe  that  / (n)  is  a convex  function,  namely  that 

f(n+l)-f(n)>f(n)-f(n-l)  for  all  n > 2.  (8) 

The  relevance  of  this  property  is  due  to  the  following  lemma,  which  is  dual  to 
the  result  of  exercise  2.3.4.5-17. 


Lemma  C.  A function  g(n)  defined  on  the  positive  integers  satisfies 

- k))  = 9(\n/2\)+g(\n/2\),  n>  2,  (9) 

if  and  only  if  it  is  convex. 

Proof.  If  g(n  + 1)  - g(n ) < g(n)  - g(n  - 1)  for  some  n > 2,  we  have  g(n  + 1)  + 
g(n  - 1)  < gin)  + g(n),  contradicting  (9).  Conversely,  if  (8)  holds  for  g,  and  if 
1 < k < n — k,  we  have  g(k  + 1)  + g(n  — k — 1)  < g(fc)  + g(n  — k)  by  convexity.  | 
The  latter  part  of  Lemma  C’s  proof  can  be  extended  for  any  m > 2 to  show 

that 


min  (s(ni)  + • • • + 

wiH b nm=n 

Tl\  , . . . ,Tlrn.  ^ 1 

= g([n/m\)  + g([(n  + l)/mj  ) + •■•+  g([{n  + m — l)/mj)  (10) 
whenever  g is  convex.  Let 

fm(n)  = f{\n/m\)  + /([(n  + l)/mj)  + h /([(n  + m - l)/mj);  (11) 

the  proof  of  Theorem  L is  completed  by  proving  that  /3(n)  + 3n  = f(n)  and 
fm(n)  + mn>  }(n)  for  all  m > 2.  (See  exercise  11.)  | 

It  would  be  very  nice  if  optimal  trees  could  always  be  characterized  neatly 
as  in  Theorem  L.  But  the  results  we  have  seen  for  a = /3  in  Table  1 show  that 
the  function  A\ (n)  is  not  always  convex.  In  fact,  Table  1 is  sufficient  to  disprove 
most  simple  conjectures  about  optimal  trees!  We  can,  however,  salvage  part  of 
Theorem  L in  the  general  case;  M.  Schlumberger  and  J.  Vuillemin  have  shown 
that  large  orders  of  merge  can  always  be  avoided: 
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Theorem  M.  Given  a and  (3  as  in  Theorem  H,  there  exists  an  optimal  tree  in 
which  the  degree  of  every  node  is  at  most 


[min  ( k + (l  + 

b+-) 

)i 

-se 

Al 

V a) 

> 

Proof.  Let  rii, . . . , nm  be  positive  integers  such  that  nx-\ \-nm  — n,  A(ni)  + 

• • • + A(nm)  = Am(n),  and  n i < • • • < nm,  and  assume  that  m > d(a,/3)  + 1. 
Let  k be  the  value  that  minimizes  (12);  we  shall  show  that 

cm(m  - k)  + /3n  + Am_k(n)  < anm  + /3n  + Am(n),  (13) 

hence  the  minimum  value  in  (5)  is  always  achieved  for  some  m < d(a.  B). 

By  definition,  since  m > k + 2,  we  must  have 

Am~k{n)  < Ai(nid hnfc+i)  + j4i(nfc+2)-| |-Ai(nm) 

< a(ni~\  hn/s+1)(/c  + l)+/3(n1H \-nk+i)  + Ai(ni)-\ \-Ai(nm) 

= (a(A;  + l)+/3)(niH \-nk+i)  + Am(n) 

< (a(k  + l)  + (3)(k  + l)n/m  + Am(n), 

and  (13)  now  follows  easily.  (Careful  inspection  of  this  proof  shows  that  (12)  is 
best  possible,  in  the  sense  that  some  optimal  trees  must  have  nodes  of  degree 
d(ot,/3);  see  exercise  13.)  | 

The  construction  in  Theorem  K needs  0(N 2)  memory  cells  and  0(N2  log  N) 
steps  to  evaluate  Am(n)  for  1 < m < n < N;  Theorem  M shows  that  only  O(N) 
cells  and  0(N2)  steps  are  needed.  Schlumberger  and  Vuillemin  have  discovered 
several  more  very  interesting  properties  of  optimal  trees  [Acta  Informatica  3 
(1973),  25-36],  Furthermore  the  asymptotic  value  of  Ai(n)  can  be  worked  out 
as  shown  in  exercise  9. 

* Another  way  to  allocate  buffers.  David  E.  Ferguson  [CACM  14  (1971), 
476-478]  pointed  out  that  seek  time  can  be  reduced  if  we  don’t  make  all  buffers 
the  same  size.  The  same  idea  occurred  at  about  the  same  time  to  several  other 
people  [S.  J.  Waters,  Comp.  J.  14  (1971),  109-112;  Ewing  S.  Walker,  Software 
Age  4 (August-September,  1970),  16-17], 

Suppose  we  are  doing  a four- way  merge  on  runs  of  equal  length  L0,  with 
M characters  of  memory.  If  we  divide  the  memory  into  equal  buffers  of  size 
B - M/ 5,  we  need  about  L0/B  seeks  on  each  input  file  and  4 L0/B  seeks  for  the 
output,  totalling  8Lq/B  = 40Lq/M  seeks.  But  if  we  use  four  input  buffers  of 
size  M/6  and  one  output  buffer  of  size  M/3,  we  need  only  about  4 x (6 L0/M)  + 
4 x (3 L0/M)  = 36 Lq/M  seeks!  The  transmission  time  is  the  same  in  both  cases, 
so  we  haven’t  lost  anything  by  the  change. 

In  general,  suppose  that  we  want  to  merge  sorted  files  of  lengths  Li,...,Lp 
into  a sorted  file  of  length 


Lp+ 1 — ■£'!  + •••  + Lp, 
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and  assume  that  a buffer  of  size  Bk  is  being  used  for  the  fcth  file.  Thus 


B i + • • • + Bp  + Bp+i  = M , 


(l4) 


where  M is  the  total  size  of  available  internal  memory.  The  number  of  seeks  will 
be  approximately 


L\  Lp  Lppi 

— 1 — 4 £-±i. 

B\  Bp  Bp+ 1 


(15) 


Let’s  try  to  minimize  this  quantity,  subject  to  condition  (14),  assuming  for 
convenience  that  the  Bk  s don’t  have  to  be  integers.  If  we  increase  Bj  by  S 
and  decrease  Bk  by  the  same  amount,  the  number  of  seeks  changes  by 


Bj Lj  Lk  Lk 

Bj  + 8 Bj  Bk  — 8 Bk 


( Lk_ 


\Bk(Bk-S)  Bj(Bj  + 8) 


so  the  allocation  can  be  improved  if  Lj/B?  / Bk/B\.  Therefore  we  get  the 
minimum  number  of  seeks  only  if 


— 

~ Bp 


Lp+i 

B2P+1 


Since  a minimum  does  exist  it  must  occur  when 


(x6) 


Bk  — \fL~k  + b \J  Lp+ 1),  l<fe<P+l;  (17) 

these  are  the  only  values  of  Bu . . . , BP+1  that  satisfy  both  (14)  and  (16).  Plug- 
ging (17)  into  (15)  gives  a fairly  simple  formula  for  the  total  number  of  seeks, 

(\/Li  + • • • + LP+i  Y/M,  (18) 

which  may  be  compared  with  the  number  (P  + l)^  -| h Lp+i)/M  obtained 

if  all  buffers  are  equal  in  length.  By  exercise  1.2.3-31,  the  improvement  is 

E (v' Tj-VTkf/M . 

i<j<k<p+\ 


Unfortunately  formula  (18)  does  not  lend  itself  to  an  easy  determination  of 
optimum  merge  patterns  as  in  Theorem  K (see  exercise  14). 

The  use  of  chaining.  M.  A.  Goetz  [ CACM  6 (1963),  245-248]  has  suggested 
an  interesting  way  to  avoid  seek  time  on  output,  by  linking  individual  tracks 
together.  His  idea  requires  a fairly  fancy  set  of  disk  storage  management  routines, 
but  it  applies  to  many  problems  besides  sorting,  and  it  may  therefore  be  a very 
worthwhile  technique  for  general-purpose  use. 

The  concept  is  simple:  Instead  of  allocating  tracks  sequentially  within  cyl- 
inders of  the  disk,  we  link  them  together  and  maintain  lists  of  available  space, 
one  for  each  cylinder.  When  it  is  time  to  output  a track  of  information,  we  write 
it  on  the  current  cylinder  (wherever  the  access  arm  happens  to  be),  unless  that 
cylinder  is  full.  In  this  way  the  seek  time  usually  disappears. 
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The  catch  is  that  we  can’t  store  a link-to-next-track  within  the  track  itself, 
since  the  necessary  information  isn’t  known  at  the  right  time.  (We  could  store  a 
link-to-previous-track  and  read  the  file  backwards  on  the  next  pass,  if  that  were 
suitable.)  A table  of  link  addresses  for  the  tracks  of  each  file  can  be  maintained 
separately,  because  it  requires  comparatively  little  space.  The  available  space 
lists  can  be  represented  compactly  by  using  bit  tables,  with  1000  bits  specifying 
the  availability  or  unavailability  of  1000  tracks. 

Forecasting  revisited.  Algorithm  5.4.6F  shows  that  we  can  forecast  which 
input  buffer  of  a P- way  merge  will  empty  first,  by  looking  at  the  last  keys  in 
each  buffer.  Therefore  we  can  be  reading  and  computing  at  the  same  time. 
That  algorithm  uses  floating  input  buffers,  not  dedicated  to  a particular  file;  so 
the  buffers  must  all  be  the  same  size,  and  the  buffer  allocation  technique  above 
cannot  be  used.  But  the  restriction  to  a uniform  buffer  size  is  no  great  loss,  since 
computers  now  have  much  larger  internal  memories  than  they  used  to.  Nowadays 
a natural  buffer  size,  such  as  the  capacity  of  a full  disk  track,  often  suggests  itself. 

Let  us  therefore  imagine  that  the  P runs  to  be  merged  each  consist  of  a 
sequence  of  data  blocks,  where  each  block  (except  possibly  the  last)  contains 
exactly  B records.  D.  L.  Whitlow  and  A.  Sasson  developed  an  interesting 
algorithm  called  SyncSort  [U.S.  Patent  4210961  (1980)],  which  improves  on 
Algorithm  5.4. 6F  by  needing  only  three  buffers  of  size  B together  with  a memory 
pool  holding  PB  records  and  PB  pointers.  By  contrast,  Algorithm  5.4.6F 
requires  2 P input  buffers  and  2 output  buffers,  but  no  pointers. 

SyncSort  begins  by  reading  the  first  block  of  each  run  and  putting  these  PB 
records  into  the  memory  pool.  Each  record  in  the  memory  pool  is  linked  to  its 
successor  in  the  run  it  belongs  to,  except  that  the  final  record  in  each  block  has 
no  successor  as  yet.  The  smallest  of  the  keys  in  those  final  records  determines 
the  run  that  will  need  to  replenished  first,  so  we  begin  to  read  the  second  block 
of  that  run  into  the  first  buffer.  Merging  begins  as  soon  as  that  second  block  has 
been  read;  by  looking  at  its  final  key  we  can  accurately  forecast  the  next  relevant 
block,  and  we  can  continue  in  the  same  way  to  prefetch  exactly  the  right  blocks 
to  input,  just  before  they  are  needed. 

The  three  SyncSort  buffers  are  arranged  in  a circle.  As  merging  proceeds, 
the  computer  is  processing  data  in  the  current  buffer,  while  input  is  being  read 
into  the  next  buffer  and  output  is  being  written  from  the  third.  The  merging 
algorithm  exchanges  each  record  in  the  current  buffer  with  the  next  record  of 
output,  namely  the  record  in  the  memory  pool  that  has  the  smallest  key.  The 
selection  tree  and  the  successor  links  are  also  updated  appropriately  as  we  make 
each  exchange.  Once  the  end  of  the  current  buffer  is  reached,  we  are  ready  to 
rotate  the  buffer  circle:  The  reading  buffer  becomes  current,  the  writing  buffer 
is  used  for  reading,  and  we'  begin  to  write  from  the  former  current  buffer. 

Many  extensions  of  this  basic  idea  are  possible,  depending  on  hardware 
capabilities.  For  example,  we  might  use  two  disks,  one  for  reading  and  one  for 
writing,  so  that  input  and  output  and  merging  can  all  take  place  simultaneously. 
Or  we  might  be  able  to  overlap  seek  time  by  extending  the  circle  to  four  or  more 
buffers,  as  in  Fig.  26  of  Section  1.4.4,  and  deviating  from  the  forecast  input  order. 


370  SORTING 


5.4.9 


Using  several  disks.  Disk  devices  once  were  massive  both  in  size  and  weight, 
but  they  became  dramatically  smaller,  lighter,  and  less  expensive  during  the 
late  1980s  — although  they  began  to  hold  more  data  than  ever  before.  Therefore 
people  began  to  design  algorithms  for  once-unimaginable  clusters  of  5 or  10  or 
50  disk  devices  or  for  even  larger  disk  farms. 

One  easy  way  to  gain  speed  with  additional  disks  is  to  use  the  technique 
of  disk  striping  for  large  files.  Suppose  we  have  D disk  units,  numbered  0,  1, 

. . . , D — 1,  and  consider  a file  that  consists  of  L blocks  a()ai  . . . a^_ i-  Striping 
this  file  on  D disks  means  that  we  put  block  a,j  on  disk  number  j mod  D;  thus, 
disk  0 holds  aoaoCi2D  ■ ■ ■ , disk  1 holds  aiau+ia2D+i  ■ ■ ■ , etc.  Then  we  can 
perform  D reads  or  D writes  simultaneously  on  D-block  groups  aoai . . ,aD_ 
(idcld+i  ■ • ■ «2D-i,  •••,  which  are  called  superblocks.  The  individual  blocks  of 
each  superblock  should  be  on  corresponding  cylinders  on  different  disks  so  that 
the  seek  time  will  be  the  same  on  each  unit.  In  essence,  we  are  acting  as  if  we 
had  a single  disk  unit  with  blocks  and  buffers  of  size  DB,  but  the  input  and 
output  operations  run  up  to  D times  faster. 

An  elegant  improvement  on  superblock  striping  can  be  used  when  we’re 
doing  2-way  merging,  or  in  general  whenever  we  want  to  match  records  with 
equal  keys  in  two  files  that  are  in  order  by  keys.  Suppose  the  blocks  a0aia2  ■ ■ ■ of 
the  first  file  are  striped  on  D disks  as  above,  but  the  blocks  60  bi  62  . . . of  the  other 
file  are  striped  in  the  reverse  direction,  with  block  bj  on  disk  (D  — 1 — j)  mod  D. 
For  example,  if  D — 5 the  blocks  a3  appear  respectively  on  disks  0,  1,  2,  3,  4, 
0,  1,  . . . , while  the  blocks  bj  for  j > 0 appear  on  4,  3,  2,  1,  0,  4,  3,  ...  . Let  aj 
be  the  last  key  of  block  aj  and  let  B3  be  the  last  key  of  block  bj . By  examining 
the  a’s  and  B’s  we  can  forecast  the  sequence  in  which  we  will  want  to  read  the 
data  blocks;  this  sequence  might,  for  example,  be 

aob()a1a2bi  0304620506  0703636465  60676369610  .... 

These  blocks  appear  respectively  on  disks 

04123  34201  23104  32104  ... 

when  D — 5,  and  if  we  read  them  five  at  a time  we  will  be  inputting  successively 
from  disks  {0, 4, 1, 2, 3},  {3, 4, 2, 0, 1},  {2, 3, 1, 0, 4},  {3, 2, 1, 0, 4},  . . . ; there  will 
never  be  a conflict  in  which  we  need  to  read  two  blocks  from  the  same  disk  at  the 
same  time!  In  general,  with  D disks  we  can  read  D at  a time  without  conflict, 
because  the  first  group  will  have  k blocks  ao  . . . a^-i  on  disks  0 through  k — 1 and 
D — k blocks  60  . . - bo-k-i  on  disks  D — 1 through  k,  for  some  k;  then  we  will  be 
poised  to  continue  in  the  same  way  but  with  disk  numbers  shifted  cyclically  by  k. 

This  trick  is  well  known  to  card  magicians,  who  call  it  the  Gilbreath  principle ; 
it  was  invented  during  the  1960s  by  Norman  Gilbreath  [see  Martin  Gardner, 
Mathematical  Magic  Show  (New  York:  Knopf,  1977),  Chapter  7;  N.  Gilbreath, 
Genii  52  (1989),  743-744].  We  need  to  know  the  cc’s  and  /3’s,  to  decide  what 
blocks  should  be  read  next,  but  that  information  takes  up  only  a small  fraction  of 
the  space  needed  by  the  a’s  and  6’s,  and  it  can  be  kept  in  separate  files.  Therefore 
we  need  fewer  buffers  to  keep  the  input  going  at  full  speed  (see  exercise  23). 
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Randomized  striping.  If  we  want  to  do  P- way  merging  with  D disks  when 
P and  D are  large,  we  cannot  keep  reading  the  information  simultaneously  from 
D disks  without  conflict  unless  we  have  a large  number  of  buffers,  because  there 
is  no  analog  of  the  Gilbreath  principle  when  P > 2.  No  matter  how  we  allocate 
the  blocks  of  a file  to  disks,  there  will  be  a chance  that  we  might  need  to  read 
many  blocks  into  memory  before  we  are  ready  to  use  them,  because  the  blocks 
that  we  really  need  might  all  happen  to  reside  on  the  same  disk. 

Suppose,  for  example,  that  we  want  to  do  8-way  merging  on  5 disks,  and 
suppose  that  the  blocks  aoaia2  • • • , 606162  • • • , ■ ■ ■ , 606162  ...  of  8 runs  have 
been  striped  with  aj  on  disk  j mod  D,  bj  on  disk  ( j + 1)  mod  D,  hj  on  disk 
(j  + ?)  mod  D.  We  might  need  to  access  these  blocks  in  the  order 

aobocodoeo  fogohodiei  d2e2d3aifi  bigi <22/263  d4cih\b2g2  a3f3e4d3de  . . . ; (19) 

then  they  appear  on  the  respective  disks 

012340124001111222222333333334...,  (20) 

so  our  best  bet  is  to  input  them  as  follows: 

Time  1 Time  2 Time  3 Time  4 Time  5 

aob0c0d0e0  fogohoCidi  Gi^bihide  d3d3gi 62  ? ? ai<i2<?2  ? 

Time  6 Time  7 Time  8 Time  9 
?/i/2a3?  ??e3/3?  ??d4e4?  ? ? ? d5  ? (21) 

By  the  time  we  are  able  to  look  at  block  d$,  we  need  to  have  read  d§  as  well  as 
15  blocks  of  future  data  denoted  by  “?”,  because  of  congestion  on  disk  3.  And 
we  will  not  yet  be  done  with  the  seven  buffers  containing  remnants  of  a3,  1)2-  ci, 
e4,  h,  92,  and  hi;  so  we  will  need  buffer  space  for  at  least  (16  + 8 + 5 )B  input 
records  in  this  particular  example. 

The  simple  superblock  approach  to  disk  striping  would  proceed  instead  to 
read  blocks  a0aia2a3a4  at  time  1,  606i626364  at  time  2,  . . . , 606i626364  at  time  8, 
then  dsdgd^dsdg  at  time  9 (since  d^d^dj d3dg  is  the  superblock  needed  next),  and 
so  on.  Using  the  SyncSort  strategy,  it  would  require  buffers  for  (P  + 3)  DB 
records  and  PDB  pointers  in  memory.  The  more  versatile  approach  indicated 
above  can  be  shown  to  need  only  about  half  as  much  buffer  space;  but  the 
memory  requirement  is  still  approximately  proportional  to  PDB  when  P and  D 
are  large  (see  exercise  24). 

R.  D.  Barve,  E.  F.  Grove,  and  J.  S.  Vitter  [ Parallel  Computing  23  (1997), 
601—631]  showed  that  a slight  modification  of  the  independent-block  approach 
leads  to  an  algorithm  that  keeps  the  disk  input/output  running  at  nearly  its  full 
speed  while  needing  only  0(P  + D log  D)  buffer  blocks  instead  of  fl(PD).  Their 
technique  of  randomized  striping  puts  block  j of  run  k on  disk  ( xk  + j ) mod  D. 
where  xk  is  a random  integer  selected  just  before  run  k is  first  written.  Instead 
of  insisting  that  D blocks  are  constantly  being  input,  one  from  each  disk,  they 
introduced  a simple  mechanism  for  holding  back  when  there  isn’t  enough  space 
to  keep  reading  ahead  on  certain  disks,  and  they  proved  that  their  method  is 
asymptotically  optimal. 
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To  do  P- way  merging  on  D disks  with  randomized  striping,  we  can  maintain 
2D  + P + Q - 1 floating  input  buffers,  each  holding  a block  of  B records.  Input 
is  typically  being  read  into  D of  these  buffers,  called  active  read  buffers,  while  P 
of  the  others  contain  the  leading  blocks  from  which  records  are  currently  being 
merged;  these  are  called  active  merge  buffers.  The  remaining  D + Q - 1 “scratch 
buffers”  are  either  empty  or  they  hold  prefetched  data  that  will  be  needed  later; 
Q is  a nonnegative  parameter  that  can  be  increased  in  order  to  lessen  the  chance 
that  reading  will  be  held  back  on  any  of  the  disks. 

The  blocks  of  all  runs  can  be  arranged  into  chronological  order  as  in  (19): 
First  we  list  block  0 of  each  run,  then  we  list  the  others  by  determining  the  order 
in  which  active  merge  buffers  will  become  empty.  As  explained  above,  this  order 
is  determined  by  the  final  keys  in  each  block,  so  we  can  readily  forecast  which 
blocks  ought  to  be  prefetched  first. 

Let’s  consider  example  (19)  again,  with  P = 8,  D = 5,  and  Q = 4.  Now  we 
will  have  only  2D  + P + Q — 1 = 21  input  buffer  blocks  to  work  with  instead 
of  the  29  that  were  needed  above  for  maximum-speed  reading.  We  will  use  the 
offsets 


xi  = 3,  x2  = 1,  x3  = 4,  X4  = 1,  x5  = 0,  xq  = 4,  x7  = 2,  xs  — 1 (22) 

(suggested  by  the  decimal  digits  of  n)  for  runs  a,  b,  . . . , h;  thus  the  respective 
disks  contain 
Disk  Blocks 

0:  e0  fi  a2  d4c4 

1;  bo  d0  ho  e4  /2  a3  d5 

2:  9o  di  e2  f>i  hi  f3  d6  . . . (23) 

3:  ao  d2  gx  e3  b2 

4:  c0  /o  d3ai  g2  e4 

if  we  list  their  blocks  in  chronological  order.  The  “random”  offsets  of  (22), 
together  with  sequential  striping  within  each  run,  will  tend  to  minimize  the 
congestion  of  any  particular  chronological  sequence.  The  actual  processing  now 
goes  like  this: 


Time  1 

Active  reading 
eoMoaoCo 

Active  merging 

Scratch  Waiting  for 
( ) ao 

Time  2 

fidQdid2fo 

ao 

bo  Co  (eo go ) 

do 

Time  3 

a2hoe2gid3 

ao  bo  Co  do 

eofo9o(did2fi ) 

ho 

Time  4 

o-2£ibigiai 

ao  fro  co  do  eo  fo  go  ho 

di(d2e2d3figia2  — ) 

ei  (24) 

Time  5 

d\f2h\e3g2 

aobocodieifogoho 

d2e2d3aifibigia2( ) 

/2 

Time  6 

cia3f3b2  e4 

a2b\Cod3e2f2giho 

e3d4(hig2 ) 

Cl 

Time  7 

? d5d6  ? ? 

a2bi  Cid4e3f2gih0 

h\b2g2a3f3e4( ) 

d5 

At  each  unit  of  time  we  are  waiting  for  the  chronologically  first  block  that  is 
not  yet  merged  and  not  yet  in  a scratch  buffer;  this  is  one  of  the  blocks  that  is 
currently  being  input  to  an  active  read  buffer.  We  assume  that  the  computer 
is  much  faster  than  the  disks;  thus,  all  blocks  before  the  one  we  are  waiting  for 
will  have  already  entered  the  merging  process  before  input  is  complete.  We  also 
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assume  that  sufficient  output  buffers  are  available  so  that  merging  will  not  be 
delayed  by  the  lack  of  a place  to  place  the  output  (see  exercise  26).  When  a round 
of  input  is  complete,  the  block  we  were  waiting  for  is  immediately  classified  as  an 
active  merge  buffer,  and  the  empty  merge  buffer  it  replaces  will  be  used  for  the 
next  active  reading.  The  other  D—l  active  read  buffers  now  trade  places  with  the 
D — l least  important  scratch  buffers;  scratch  buffers  are  ranked  by  chronological 
order  of  their  contents.  On  the  next  round  we  will  wait  for  the  first  unmerged 
block  that  isn’t  present  in  the  scratch  buffers.  Any  scratch  buffers  preceding  that 
block  in  chronological  order  will  become  part  of  the  active  merge  before  the  next 
input  cycle,  but  the  others  — shown  in  parentheses  above  — will  be  carried  over 
and  they  will  remain  as  scratch  buffers  on  the  next  round.  However,  at  most  Q 
of  the  buffers  in  parentheses  can  be  carried  over,  because  we  will  need  to  convert 
D — l scratch  buffers  to  active  read  status  immediately  after  the  input  is  ready. 
Any  additional  scratch  buffers  are  effectively  blanked  out,  as  if  they  hadn’t  been 
read.  This  blanking-out  occurs  at  Time  4 in  (24):  We  cannot  carry  all  six  of 
the  blocks  d^e^d^figia^  over  to  Time  5,  because  Q = 4,  so  we  reread  51  and  a^. 
Otherwise  the  reading  operations  in  this  example  take  place  at  full  speed. 

Exercise  29  proves  that,  given  any  chronological  sequence  of  runs  to  be 
merged,  the  method  of  randomized  striping  will  achieve  the  minimum  number 
of  disk  reads  within  a factor  of  r(D,  Q- f 2),  on  the  average,  where  the  function  r 
is  tabulated  in  Table  2.  For  example,  if  D = 4 and  Q = 18,  the  average  time 
to  do  a P- way  merge  on  L blocks  of  data  with  4 disks  and  P + 25  input  buffers 
will  be  at  most  the  time  to  read  r(4, 20 )L/D  1.785L/4  blocks  on  a single  disk. 
This  theoretical  upper  bound  is  quite  conservative;  in  practice  the  performance 
is  even  better,  very  near  the  optimum  time  L/4. 


Table  2 

GUARANTEES  ON  THE  PERFORMANCE  OF  RANDOMIZED  STRIPING 


r(d,  d) 

r(d,  2d) 

r(d , 3d) 

r(d , 4 d) 

r(d,  5 d) 

r(d,  6 d) 

r(d,  7 d) 

r(cf,  8 d) 

r(d,  9 d) 

r(d,  10  d) 

d = 

2 

1.500 

1.500 

1.499 

1.467 

1.444 

1.422 

1.393 

1.370 

1.353 

1.339 

d = 

4 

2.460 

2.190 

1.986 

1.888 

1.785 

1.724 

1.683 

1.633 

1.597 

1.570 

d = 

8 

3.328 

2.698 

2.365 

2.183 

2.056 

1.969 

1.889 

1.836 

1.787 

1.743 

d = 

16 

4.087 

3.103 

2.662 

2.434 

2.277 

2.156 

2.067 

1.997 

1.933 

1.890 

d = 

32 

4.503 

3.392 

2.917 

2.654 

2.458 

2.319 

2.218 

2.130 

2.062 

2.005 

d = 

64 

5.175 

3.718 

3.165 

2.847 

2.613 

2.465 

2.346 

2.249 

2.174 

2.107 

d = 

128 

5.431 

3.972 

3.356 

2.992 

2.759 

2.603 

2.459 

2.358 

2.273 

2.201 

d = 

256 

5.909 

4.222 

3.536 

3.155 

2.910 

2.714 

2.567 

2.464 

2.363 

2.289 

d = 

512 

6.278 

4.455 

3.747 

3.316 

3.024 

2.820 

2.675 

2.556 

2.450 

2.375 

d = 

1024 

6.567 

4.689 

3.879 

3.434 

3.142 

2.937 

2.780 

2.639 

2.536 

2.452 

Will  keysorting  help?  When  records  are  long  and  keys  are  short,  it  is  very 
tempting  to  create  a new  file  consisting  simply  of  the  keys  together  with  a serial 
number  specifying  their  original  file  location.  After  sorting  this  key  file,  we  can 
replace  the  keys  by  the  successive  numbers  1,2,...;  the  new  file  can  then  be 
sorted  by  original  file  location  and  we  will  have  a convenient  specification  of  how 
to  unshuffle  the  records  for  the  final  rearrangement.  Schematically,  the  process 
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has  the  following  form: 


i) 

Original  file 

(K1,I1)(K2,I2).. 

■ {Kn,  In) 

long 

ii) 

Key  file 

(Ku  1)(K2,2).. 

(Kn,N) 

short 

iii) 

Sorted  (ii) 

( Kpi,Pi){KP2,P2 ).. 

(Kpn’Pn) 

short 

iv) 

Edited  (iii) 

(l,Pi)(2,p2).. 

( N,Pn ) 

short 

v) 

Sorted  (iv) 

(9i,1)(92,2).. 

C qN,N ) 

short 

vi) 

Edited  (i) 

{qii  h){q2,  h)  ■ ■ 

{qN i In) 

long 

Here  pj  = k if  and  only  if  qk  = j.  The  two  sorting  processes  in  (iii)  and  (v)  are 
comparatively  fast  (perhaps  even  internal  sorts),  since  the  records  aren’t  very 
long.  In  stage  (vi)  we  have  reduced  the  problem  to  sorting  a file  whose  keys  are 
simply  the  numbers  (1,2,...,  IV};  each  record  now  specifies  exactly  where  it  is 
to  be  moved. 

The  external  rearrangement  problem  that  remains  after  stage  (vi)  seems 
trivial,  at  first  glance;  but  in  fact  it  is  rather  difficult,  and  no  really  good 
algorithms  (significantly  better  than  sorting)  have  yet  been  found.  We  could 
obviously  do  the  rearrangement  in  N steps,  moving  one  record  at  a time;  for 
large  enough  N this  is  better  than  the  N log  IV  of  a sorting  method.  But  N is 
never  that  large;  N is,  however,  sufficiently  large  that  N seeks  are  unthinkable. 

A radix  sorting  method  can  be  used  efficiently  on  the  edited  records  of  (vi), 
since  their  keys  have  a perfectly  uniform  distribution.  On  modern  computers,  the 
processing  time  for  an  eight-way  distribution  is  much  faster  than  the  processing 
time  for  an  eight-way  merge;  hence  a distribution  sort  is  probably  the  best 
procedure.  (See  Section  5.4.7,  and  see  also  exercise  19.) 

On  the  other  hand,  it  seems  wasteful  to  do  a full  sort  after  the  keys  have 
already  been  sorted.  One  reason  the  external  rearrangement  problem  is  unex- 
pectedly difficult  has  been  discovered  by  R.  W.  Floyd,  who  found  a nontrivial 
lower  bound  on  the  number  of  seeks  required  to  rearrange  records  on  a disk  device 
[Complexity  of  Computer  Computations  (New  York:  Plenum,  1972),  105-109], 

It  is  convenient  to  describe  Floyd’s  result  in  terms  of  the  elevator  problem  of 
Section  5.4.8;  but  this  time  we  want  to  find  an  elevator  schedule  that  minimizes 
the  number  of  stops,  instead  of  minimizing  the  distance  traveled.  Minimizing 
the  number  of  stops  is  not  precisely  equivalent  to  finding  the  minimum-seek 
rearrangement  algorithm,  since  a stop  combines  input  to  the  elevator  with  output 
from  the  elevator;  but  the  stop-minimization  criterion  is  close  enough  to  indicate 
the  basic  ideas. 

We  shall  make  use  of  the  “discrete  entropy”  function 

F(n)  = + l)  = B(n)  + n - 1 = nflgn]  - 2^Ign1  + n,  (25) 

l<k<n 

where  B(n)  is  the  binary  insertion  function,  Eq.  5. 3. 1^(3).  By  Eq.  5. 3. 1^(34), 
F(n)  is  the  minimum  external  path  length  of  a binary  tree  with  n leaves,  and 

nlgn  < F(n)  < nlgn  + 0.0861n.  (26) 
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Since  F(n)  is  convex  and  satisfies  F(n)  = n + F(|_n/2j)  + F(\n/ 2]),  we  know 
by  Lemma  C above  that 

F(n)  < F(k)  + F(n  — k)  + n,  for  0 < A;  < n.  (27) 

This  relation  is  also  evident  from  the  external  path  length  characterization  of  F; 
it  is  the  crucial  fact  we  need  in  the  following  argument. 

As  in  Section  5.4.8  we  shall  assume  that  each  floor  holds  b people,  the 
elevator  holds  m people,  and  there  are  n floors.  Let  Sij  be  the  number  of  people 
currently  on  floor  i whose  destination  is  floor  j.  The  togetherness  rating  of  any 
configuration  of  people  in  the  building  is  defined  to  be  the  sum  ^i<i  j<n  F(sij)- 
For  example,  assume  that  b — m = n = 6 and  that  the  36  people  are  initially 
scattered  among  the  floors  as  follows: 

uuuuuu  . . 

123456  123456  123456  123456  123456  123456 

The  elevator  is  empty,  sitting  on  floor  1;  “u”  denotes  a vacant  position.  Each 
floor  contains  one  person  with  each  possible  destination,  so  all  s^  are  1 and  the 
togetherness  rating  is  zero.  If  the  elevator  now  transports  six  people  to  floor  2, 
we  have  the  configuration 

123456  , 

uuuuuu  123456  123456  123456  123456  123456  v29) 

and  the  togetherness  rating  becomes  6F(0)  + 24F(1)  + 6F(2)  = 12.  Suppose  the 
elevator  now  carries  1,  1,  2,  3,  3,  and  4 to  floor  3: 

112334  , s 

uuuuuu  245566  123456  123456  123456  123456 

The  togetherness  rating  has  jumped  to  4F(2)  + 2F(3)  = 18.  When  all  people 
have  finally  been  transported  to  their  destinations,  the  togetherness  rating  will 
be  6F(6)  = 96. 

Floyd  observed  that  the  togetherness  rating  can  never  increase  by  more  than 
b + m at  each  stop,  since  a set  of  s equal-destination  people  joining  with  a similar 
set  of  size  s'  improves  the  rating  by  F(s  + s')  - F(s)  - F(s')  < ,s  + s’ . Therefore 
we  have  the  following  result. 

Theorem  F.  Let  t be  the  togetherness  rating  of  an  initial  configuration  of 
bn  people,  in  terms  of  the  definitions  above.  The  elevator  must  make  at  least 

\(F{b)n-t)/(b  + m)] 

stops  in  order  to  bring  them  all  to  their  destinations.  | 

Translating  this  result  into  disk  terminology,  let  there  be  bn  records,  with 
b per  block,  and  suppose  the  internal  memory  can  hold  m records  at  a time. 
Every  disk  read  brings  one  block  into  memory,  every  disk  write  stores  one  block, 
and  s^  is  the  number  of  records  in  block  i that  belong  in  block  j.  If  n > b, 
there  are  initial  configurations  in  which  all  the  s^  are  < 1;  so  t = 0 and  at  least 
f(b)n/ (b  + m)  « ( bnlgb)/m  block-reading  operations  are  necessary  to  rearrange 
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the  records.  (The  factor  lgft  makes  this  lower  bound  nontrivial  when  b is  large.) 
Exercise  17  derives  a substantially  stronger  lower  bound  for  the  common  case 
that  m is  substantially  larger  than  b. 

EXERCISES 

1.  [M22]  The  text  explains  a method  by  which  the  average  latency  time  required 
to  read  a fraction  x of  a track  is  reduced  from  \ to  |(1  - x2)  revolutions.  This  is 
the  minimum  possible  value,  when  there  is  one  access  arm.  What  is  the  corresponding 
minimum  average  latency  time  if  there  are  two  access  arms,  180°  apart,  assuming  that 
only  one  arm  can  transmit  data  at  any  one  time? 

2.  [M30]  (A.  G.  Konheim.)  The  purpose  of  this  problem  is  to  investigate  how  far  the 
access  arm  of  a disk  must  move  while  merging  files  that  are  allocated  “orthogonally” 
to  the  cylinders.  Suppose  there  are  P files,  each  containing  L blocks  of  records,  and 
assume  that  the  first  block  of  each  file  appears  on  cylinder  1,  the  second  on  cylinder  2, 
etc.  The  relative  order  of  the  last  keys  in  each  block  governs  the  access  arm  motion 
during  the  merge,  hence  we  may  represent  the  situation  in  the  following  mathematically 
tractable  way:  Consider  a set  of  PL  ordered  pairs 


(fflll>  1) 

(°21)1) 

. . (dpi,  1) 

(tt  12,  2) 

(a2  2,2) 

••  (dp2,  2) 

(aiL,L) 

(d2L,L)  ., 

. . ( apL,L ) 

where  the  set  {ay  | 1 < i < P,  1 < 

j < L}  consists  of  the  numbers  {1,2, 

some  order,  and  where  ay  < al(j+1)  for  1 < j < L.  (Rows  represent  cylinders,  columns 
represent  input  files.)  Sort  the  pairs  on  their  first  components  and  let  the  resulting 
sequence  be  (1,  jt)  (2 ,j2) . . . ( PL,jPL ).  Show  that,  if  each  of  the  (PL)\/L\p  choices  of 
the  a,ij  is  equally  likely,  the  average  value  of 

I h ~ h\  + \h  — h\  -t h | jpL  - jpL—i | 

is 

(L  - 1)  (l  + (P-  1)2— / . 

[Hint:  See  exercise  5.2.1—14.]  Notice  that  as  L — > oc  this  value  is  asymptotically  equal 
to  \{P-1)LV^L  + 0{PL). 

3.  [ M15 ] Suppose  the  internal  memory  is  limited  so  that  10-way  merging  is  not 
feasible.  How  can  recurrence  relations  (3),  (4),  (5)  be  modified  so  that  Ax(n)  is  the 
minimum  value  of  aD(T)  + f)E(T),  over  all  n- leaved  trees  T having  no  internal  nodes 
of  degree  greater  than  9? 

► 4.  [M21]  Consider  a modified  form  of  the  square  root  buffer  allocation  scheme,  in 
which  all  P of  the  input  buffers  have  equal  length,  but  the  output  buffer  size  should 
be  chosen  so  as  to  minimize  seek  time. 

a)  Derive  a formula  corresponding  to  (2),  for  the  running  time  of  an  L-character 
P-way  merge. 

b)  Show  that  the  construction  in  Theorem  K can  be  modified  in  order  to  obtain  a 
merge  pattern  that  is  optimal  according  to  your  formula  from  part  (a). 


5.4.9 


DISKS  AND  DRUMS  377 


5.  [M20]  When  two  disks  are  being  used,  so  that  reading  on  one  is  overlapped  with 
writing  on  the  other,  we  cannot  use  merge  patterns  like  that  of  Fig.  93  since  some  leaves 
are  at  even  levels  and  some  are  at  odd  levels.  Show  how  to  modify  the  construction  of 
Theorem  K in  order  to  produce  trees  that  are  optimal  subject  to  the  constraint  that 
all  leaves  appear  on  even  levels  or  all  on  odd  levels. 

► 6.  [22]  Find  a tree  that  is  optimum  in  the  sense  of  exercise  5,  when  n = 23  and 
a — 0 = 1.  (You  may  wish  to  use  a computer.) 

► 7.  [ M24 } When  the  initial  runs  are  not  all  the  same  length,  the  best  merge  pattern 
(in  the  sense  of  Theorem  H)  minimizes  aD(T)  + 0E{T).  where  D(T)  and  E{T ) now 
represent  weighted  path  lengths:  Weights  wi, ...  ,wn  (corresponding  to  the  lengths  of 
the  initial  runs)  are  attached  to  each  leaf  of  the  tree,  and  the  degree  sums  and  path 
lengths  are  multiplied  by  the  appropriate  weights.  For  example,  if  T is  the  tree  of 
Fig.  92,  we  would  have  D(T ) = 6uq  + 6vj2  + 7w3  + 9w4  + 9w5  + 7w&  + 4w7  + 4 wg, 
E(T)  = 2wi  + 2w2  + 2w3  + 3u)4  + 3u)5  + 2w6  + w?  + Ws- 

Prove  that  there  is  always  an  optimal  pattern  in  which  the  shortest  k runs  are 
merged  first,  for  some  k. 

8.  [49]  Is  there  an  algorithm  that  finds  optimal  trees  for  given  a,  0 and  weights 
wi, ...  ,wn,  in  the  sense  of  exercise  7,  taking  only  0(nc ) steps  for  some  c? 

9.  [HM39]  (L.  Hyafil,  F.  Prusker,  J.  Vuillemin.)  Prove  that,  for  fixed  a and  0, 

. . . ( . am  + 0 \ , _ . . 

Ai(n)  = mm  — nlogn  + O(n) 

\m> 2 log  m J 

asn-+  00,  where  the  O(n)  term  is  > 0. 

10.  [HMJ,4]  (L.  Hyafil,  F.  Prusker,  J.  Vuillemin.)  Prove  that  when  a and  0 are  fixed, 
Ai(n)  = amn  + 0n  + Am(n)  for  all  sufficiently  large  n,  if  m minimizes  the  coefficient 
in  exercise  9. 

11.  [M29]  In  the  notation  of  (6)  and  (11),  prove  that  fm(n)+mn  > f(n)  for  all  m > 2 
and  n > 2,  and  determine  all  m and  n for  which  equality  holds. 

12.  [25]  Prove  that,  for  all  n > 0,  there  is  a tree  with  n leaves  and  minimum  degree 
path  length  (6),  with  all  leaves  at  the  same  level. 

13.  [M24]  Show  that  for  2 < n < d(a,0),  where  d(a,0)  is  defined  in  (12),  the  unique 
best  merge  pattern  in  the  sense  of  Theorem  H is  an  n- way  merge. 

14.  [40]  Using  the  square  root  method  of  buffer  allocation,  the  seek  time  for  the 
merge  pattern  in  Fig.  92  would  be  proportional  to  (\/2  + yfi  + %/T  + VT  + \/8)2  + 
(\/T  + VI  + V2)2  + (\/l  + \[2  + \fl  + \/4)2  + (v/I  + Vl  + t/2)2;  this  is  the  sum, 
over  each  internal  node,  of  (\Aii  + • • • + V nrn  + %/m  + • • • + nm  )2,  where  that  node’s 
respective  subtrees  have  (ni, . . . , nm)  leaves.  Write  a computer  program  that  generates 
minimum-seek  time  trees  having  1,  2,  3,  ...  leaves,  based  on  this  formula. 

15.  [ M22 ] Show  that  Theorem  F can  be  improved  slightly  if  the  elevator  is  initially 
empty  and  if  F(b)n  ^ t:  At  least  \(F(b)n  + m — t)/(b  + m )]  stops  are  necessary  in 
such  a case. 

16.  [23]  (R.  W.  Floyd.)  Find  an  elevator  schedule  that  transports  all  the  people 
of  (28)  to  their  destinations  in  at  most  12  stops.  (Configuration  (29)  shows  the  situation 
after  one  stop,  not  two.) 
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► 17.  [HM2S]  (R.  W.  Floyd,  1980.)  Show  that  the  lower  bound  of  Theorem  F can  be 
improved  to 

n(61nn  — In  6 — 1) 

Inn  + 6(1  + ln(l  + m/6))  ’ 

in  the  sense  that  some  initial  configuration  must  require  at  least  this  many  stops.  [Hint: 
Count  the  configurations  that  can  be  obtained  after  s stops.] 

18.  [HM26]  Let  L be  the  lower  bound  of  exercise  17.  Show  that  the  average  number 
of  elevator  stops  needed  to  take  all  people  to  their  desired  floors  is  at  least  L — 1 , when 
the  (bn)!  possible  permutations  of  people  into  bn  desks  are  equally  likely. 

► 19.  [25]  (B.  T.  Bennett  and  A.  C.  McKellar.)  Consider  the  following  approach  to 
keysorting,  illustrated  on  an  example  file  with  10  keys: 

i)  Original  file:  (50, J0)(08, 70(51, /2)(06,/3)(90,/4)(17,/B)(89,/6)(27,/r)(65,/8)(42,/9) 
n)  Key  file:  (50, 0)(08, 1)(51, 2)(06, 3)(90, 4)(17, 5)(89, 6)(27, 7)(65, 8)(42, 9) 

iii)  Sorted  (ii):  (06, 3)(08, 1)(17, 5)(27,  7)(42, 9)(50, 0)(51, 2)(65, 8)(89, 6)(90, 4) 

iv)  Bin  assignments  (see  below):  (2, 1)(2, 3)(2,  5)(2, 7)(2, 8)(2, 9)(1, 0)(l,  2)(1, 4)(1, 6) 

v)  Sorted  (iv):  (1, 0)(2, 1)(1, 2)(2, 3)(1, 4)(2, 5)(1, 6)(2,  7)(2, 8)(2, 9) 

vi)  (i)  distributed  into  bins  using  (v): 

Bin  1:  (50,/o)(51,/2)(90,/4)(89,/6) 

Bin  2:  (08, /i)(06, 73)(17, 75)(27, /7)(65, 78)(42, /9) 

vii)  The  result  of  replacement  selection,  reading  first  bin  2,  then  bin  1: 

(06,73)(08,/1)(17,/5)(27,/7)(42,/9)(50,/o)(51,/2)(65,/8)(89,/6)(90,/4) 

The  assignment  of  bin  numbers  in  step  (iv)  is  made  by  doing  replacement  selection 
on  (iii),  from  right  to  left,  in  decreasing  order  of  the  second  component.  The  bin 
number  is  the  run  number.  The  example  above  uses  replacement  selection  with  only 
two  elements  in  the  selection  tree;  the  same  size  tree  should  be  used  for  replacement 
selection  in  both  (iv)  and  (vii).  Notice  that  the  bin  contents  are  not  necessarily  in 
sorted  order! 

Prove  that  this  method  will  sort,  namely  that  the  replacement  selection  in  (vii) 
wdl  produce  only  one  run.  (This  technique  reduces  the  number  of  bins  needed  in  a 
conventional  keysort  by  distribution,  especially  if  the  input  is  largely  in  order  already.) 

► 20.  [25]  Modern  hardware/ software  systems  provide  programmers  with  a virtual  mem- 
ory: Programs  are  written  as  if  there  were  a very  large  internal  memory,  able  to  contain 
all  of  the  data.  This  memory  is  divided  into  pages , only  a few  of  which  are  in  the  actual 
internal  memory  at  any  one  time;  the  others  are  on  disks  or  drums.  Programmers  need 
not  concern  themselves  with  such  details,  since  the  system  takes  care  of  everything; 
new  pages  are  automatically  brought  into  memory  when  needed. 

It  would  seem  that  the  advent  of  virtual  memory  technology  makes  external  sorting 
methods  obsolete,  since  the  job  can  simply  be  done  using  the  techniques  developed  for 
internal  sorting.  Discuss  this  situation;  in  what  ways  might  a hand-tailored  external 
sorting  method  be  better  than  the  application  of  a general-purpose  paging  technique 
to  an  internal  sorting  method? 

► 21.  [Ml  5]  How  many  blocks  of  an  L-block  file  go  on  disk  j when  the  file  is  striped  on 
D disks? 

22.  [22]  If  you  are  merging  two  files  with  the  Gilbreath  principle  and  you  want  to 
store  the  keys  otj  with  the  a blocks  and  the  keys  ft  with  the  6 blocks,  in  which  block 
should  otj  be  placed  in  order  to  have  the  information  available  when  it  is  needed? 
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► 23.  [20]  How  much  space  is  needed  for  input  buffers  to  keep  input  going  continuously 
when  two-way  merging  is  done  by  (a)  superblock  striping?  (b)  the  Gilbreath  principle? 

24.  [M36]  Suppose  P runs  have  been  striped  on  D disks  so  that  block  j of  run  k 
appears  on  disk  (xk  + j)  mod  D.  A P- way  merge  will  read  those  blocks  in  some 
chronological  order  such  as  (19).  If  groups  of  D blocks  are  to  be  input  continuously,  we 
will  read  at  time  t the  chronologically  tth  block  stored  on  each  disk,  as  in  (21).  What 
is  the  minimum  number  of  buffer  records  needed  in  memory  to  hold  input  data  that 
has  not  yet  been  merged,  regardless  of  the  chronological  order?  Explain  how  to  choose 
the  offsets  xx,  x2,  . . . , xP  so  that  the  fewest  buffers  are  needed  in  the  worst  case. 

25.  [23]  Rework  the  text’s  example  of  randomized  striping  for  the  case  Q = 3 instead 
of  Q — 4.  What  buffer  contents  would  occur  in  place  of  (24)? 

26.  [26]  How  many  output  buffers  will  guarantee  that  a P- way  merge  with  randomized 
striping  will  never  have  to  pause  for  lack  of  a place  in  internal  memory  to  put  newly 
merged  output?  Assume  that  the  time  to  write  a block  equals  the  time  to  read  a block. 

27.  [HM27]  ( The  cyclic  occupancy  problem.)  Suppose  n empty  urns  have  been  ar- 
ranged in  a circle  and  assigned  the  numbers  0,  1,  . . . , n — 1.  For  k = 1,  2,  . . . , p,  we 
throw  mk  balls  into  urns  (Xk  + j)  mod  n for  j = 0,  1,  . . . , m*,  — 1,  where  the  integers 
Xk  are  chosen  at  random.  Let  Sn(m  1, . . . , mp)  be  the  number  of  balls  in  urn  0,  and  let 
En(mi, . . . , mp)  be  the  expected  number  of  balls  in  the  fullest  urn. 

a)  Prove  that  En(mi, . . . ,mp)  < YltLi  min(l,  n Pr(Sn(mi, . . . , mp)  > t)),  where 
m = mi  + • • • + mp. 

b)  Use  the  tail  inequality,  Eq.  1.2.10— (25),  to  prove  that 


m 

En  (m\ , . . . ,mp)  < E mini  1, 


n(l  + at/n)m\ 

(1  + Qt)‘  J 


for  any  nonnegative  real  numbers  aj,  a2,  . . . , am.  What  values  of  Qi,  . . . , am 
give  the  best  upper  bound? 

28.  [HM47]  Continuing  exercise  27,  is  En(m\, . . . ,mp)  > En(mi  + m2,  m3, . . . , mp)? 

► 29.  [M30]  The  purpose  of  this  exercise  is  to  derive  an  upper  bound  on  the  average 
time  needed  to  input  any  sequence  of  blocks  in  chronological  order  by  the  randomized 
striping  procedure,  when  the  blocks  represent  P runs  and  D disks.  We  say  that 
the  block  being  waited  for  at  each  time  step  as  the  algorithm  proceeds  (see  (24)) 
is  “marked”;  thus  the  total  input  time  is  proportional  to  the  number  of  marked  blocks. 
Marking  depends  only  on  the  chronological  sequence  of  disk  accesses  (see  (20)). 

a)  Prove  that  if  Q + 1 consecutive  blocks  in  chronological  order  have  Nj  blocks  on 
disk  j,  then  at  most  max(Aro,  Ni, , Nd-i)  of  those  blocks  are  marked. 

b)  Strengthen  the  result  of  (a)  by  showing  that  it  holds  also  for  Q + 2 consecutive 
blocks. 

c)  Now  use  the  cyclic  occupancy  problem  of  exercise  27  to  obtain  an  upper  bound  on 
the  average  running  time  in  terms  of  a function  r(D,  Q + 2)  as  in  Table  2,  given 
any  chronological  order.- 

30.  [HM30]  Prove  that  the  function  r(d,m)  of  exercise  29  satisfies  r(cl,  sd  log  d)  — 
1 + 0(1/^)  for  fixed  d as  s -A  00. 

31.  [HM48]  Analyze  randomized  striping  to  determine  its  true  average  behavior,  not 
merely  an  upper  bound,  as  a function  of  P,  Q,  and  D.  (Even  the  case  Q — 0,  which 
needs  an  average  of  Q(L/\/D ) read  cycles,  is  interesting.) 
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5.5.  SUMMARY,  HISTORY,  AND  BIBLIOGRAPHY 

Now  that  WE  have  nearly  reached  the  end  of  this  enormously  long  chapter,  we 
had  better  “sort  out”  the  most  important  facts  that  we  have  studied. 

An  algorithm  for  sorting  is  a procedure  that  rearranges  a file  of  records  so 
that  the  keys  are  in  ascending  order.  This  orderly  arrangement  is  useful  because 
it  brings  equal-key  records  together,  it  allows  efficient  processing  of  several  files 
that  are  sorted  on  the  same  key,  it  leads  to  efficient  retrieval  algorithms,  and  it 
makes  computer  output  look  less  chaotic. 

Internal  sorting  is  used  when  all  of  the  records  fit  in  the  computer’s  high 
speed  internal  memory.  We  have  studied  more  than  two  dozen  algorithms  for 
internal  sorting,  in  various  degrees  of  detail;  and  perhaps  we  would  be  happier 
if  we  didn’t  know  so  many  different  approaches  to  the  problem!  It  was  fun  to 
learn  all  the  techniques,  but  now  we  must  face  the  horrible  prospect  of  actually 
deciding  which  method  ought  to  be  used  in  a given  situation. 

It  would  be  nice  if  only  one  or  two  of  the  sorting  methods  would  dominate 
all  of  the  others,  regardless  of  the  application  or  the  computer  being  used.  But 
in  fact,  each  method  has  its  own  peculiar  virtues.  For  example,  the  bubble  sort 
(Algorithm  5.2.2B)  has  no  apparent  redeeming  features,  since  there  is  always 
a better  way  to  do  what  it  does;  but  even  this  technique,  suitably  generalized, 
turns  out  to  be  useful  for  two-tape  sorting  (see  Section  5.4.8).  Thus  we  find 
that  nearly  all  of  the  algorithms  deserve  to  be  remembered,  since  there  are  some 
applications  in  which  they  turn  out  to  be  best. 

The  following  brief  survey  gives  the  highlights  of  the  most  significant  al- 
gorithms we  have  encountered  for  internal  sorting.  As  usual,  N stands  for  the 
number  of  records  in  the  given  file. 

1.  Distribution  counting,  Algorithm  5.2D,  is  very  useful  when  the  keys  have 
a small  range.  It  is  stable  (doesn’t  affect  the  order  of  records  with  equal  keys), 
but  requires  memory  space  for  counters  and  for  2N  records.  A modification  that 
saves  N of  these  record  spaces  at  the  cost  of  stability  appears  in  exercise  5.2-13. 

2.  Straight  insertion,  Algorithm  5. 2. IS,  is  the  simplest  method  to  program, 
requires  no  extra  space,  and  is  quite  efficient  for  small  N (say  N < 25).  For 
large  N it  is  unbearably  slow  unless  the  input  is  nearly  in  order. 

3.  Shellsort,  Algorithm  5.2. ID,  is  also  quite  easy  to  program,  and  uses 
minimum  memory  space;  and  it  is  reasonably  efficient  for  moderately  large  N 
(say  N < 1000). 

4.  List  insertion,  Algorithm  5.2. 1L,  uses  the  same  basic  idea  as  straight 
insertion,  so  it  is  suitable  only  for  small  N.  Like  the  other  list  sorting  methods 
described  below,  it  saves  the  cost  of  moving  long  records  by  manipulating  links; 
this  is  particularly  advantageous  when  the  records  have  variable  length  or  are 
part  of  other  data  structures. 

5.  Address  calculation  techniques  are  efficient  when  the  keys  have  a known 
(usually  uniform)  distribution;  the  principal  variants  of  this  approach  are  mul- 
tiple list  insertion  (Program  5.2. 1M),  and  MacLaren’s  combined  radix-insertion 
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method  (discussed  at  the  close  of  Section  5.2.5).  The  latter  can  be  done  with  only 
0(y/N)  cells  of  additional  memory.  A two-pass  method  that  learns  a nonuniform 
distribution  is  discussed  in  Theorem  5.2.5T. 

6.  Merge  exchange,  Algorithm  5.2.2M  (Batcher’s  method)  and  its  cousin  the 
bitonic  sort  (exercise  5.3.4-10)  are  useful  when  a large  number  of  comparisons 
can  be  made  simultaneously. 

7.  Quicksort,  Algorithm  5.2.2Q  (Hoare’s  method)  is  probably  the  most  useful 
general-purpose  technique  for  internal  sorting,  because  it  requires  very  little 
memory  space  and  its  average  running  time  on  most  computers  beats  that  of 
its  competitors  when  it  is  well  implemented.  It  can  run  very  slowly  in  its  worst 
case,  however,  so  a careful  choice  of  the  partitioning  elements  should  be  made 
whenever  nonrandom  data  are  likely.  Choosing  the  median  of  three  elements,  as 
suggested  in  exercise  5.2.2-55,  makes  the  worst-case  behavior  extremely  unlikely 
and  also  improves  the  average  running  time  slightly. 

8.  Straight  selection,  Algorithm  5.2.3S,  is  a simple  method  especially  suitable 
when  special  hardware  is  available  to  find  the  smallest  element  of  a list  rapidly. 

9.  Heapsort , Algorithm  5.2.3H,  requires  minimum  memory  and  is  guaran- 
teed to  run  pretty  fast;  its  average  time  and  its  maximum  time  are  both  roughly 
twice  the  average  running  time  of  quicksort. 

10.  List  merging,  Algorithm  5.2.4L,  is  a list  sort  that,  like  heapsort,  is 
guaranteed  to  be  rather  fast  even  in  its  worst  case;  moreover,  it  is  stable  with 
respect  to  equal  keys. 

11.  Radix  sorting,  using  Algorithm  5.2.5R,  is  a list  sort  especially  appropri- 
ate for  keys  that  are  either  rather  short  or  that  have  an  unusual  lexicographic 
collating  sequence.  The  method  of  distribution  counting  (point  1 above)  can  also 
be  used,  as  an  alternative  to  linking;  such  a procedure  requires  2 N record  spaces, 
plus  a table  of  counters,  but  the  simple  form  of  its  inner  loop  makes  it  especially 
good  for  ultra- fast,  “number-crunching”  computers  that  have  look-ahead  control. 
Caution:  Radix  sorting  should  not  be  used  for  small  TV! 

12.  Merge  insertion,  see  Section  5.3.1,  is  especially  suitable  for  very  small 
values  of  TV,  in  a “straight-line-coded”  routine;  for  example,  it  would  be  the 
appropriate  method  in  an  application  that  requires  the  sorting  of  numerous 
five-  or  six-record  groups. 

13.  Hybrid  methods,  combining  one  or  more  of  the  techniques  above,  are  also 
possible.  For  example,  merge  insertion  could  be  used  for  sorting  short  subfiles 
that  arise  in  quicksort. 

14.  Finally,  an  unnamed  method  appearing  in  the  answer  to  exercise  5. 2. 1-3 
seems  to  require  the  shortest  possible  sorting  program.  But  its  average  running 
time,  proportional  to  TV3,  makes  it  the  slowest  sorting  routine  in  this  book! 

Table  1 summarizes  the  speed  and  space  characteristics  of  many  of  these 
methods,  when  programmed  for  MIX.  It  is  important  to  realize  that  the  figures 
in  this  table  are  only  rough  indications  of  the  relative  sorting  times;  they  apply 
to  one  computer  only,  and  the  assumptions  made  about  input  data  are  not 
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completely  consistent  for  all  programs.  Comparative  tables  such  as  this  have 
been  given  by  many  authors,  with  no  two  people  reaching  the  same  conclusions. 
On  the  other  hand,  the  timings  do  give  at  least  an  indication  of  the  kind  of 
speed  to  be  expected  from  each  algorithm,  when  sorting  a rather  small  array  of 
one-word  records,  since  MIX  is  a fairly  typical  computer. 

The  “space”  column  in  Table  1 gives  some  information  about  the  amount 
of  auxiliary  memory  used  by  each  program,  in  units  of  record  length.  Here 
e denotes  the  fraction  of  a record  needed  for  one  link  field;  thus,  for  example, 
N(  1 + e)  means  that  the  method  requires  space  for  N records  plus  N link  fields. 

The  asymptotic  average  and  maximum  times  appearing  in  Table  1 give  only 
the  leading  terms  that  dominate  for  large  N,  assuming  random  input;  c denotes 
an  unspecified  constant.  These  formulas  can  often  be  misleading,  so  actual  total 
running  times  have  also  been  listed,  for  sample  runs  of  the  program  on  two 
particular  sequences  of  input  data.  The  case  N = 16  refers  to  the  sixteen  keys 
that  appear  in  so  many  of  the  examples  of  Section  5.2;  and  the  case  N = 1000 
refers  to  the  sequence  K\,  K2,  ■ ■ ■ , Auooo  defined  by 

K1001  = 0;  Kn- 1 = (3141592621A'„  + 2113148651)  mod  1010. 

A MIX  program  of  reasonably  high  quality  has  been  used  to  represent  each  algo- 
rithm in  the  table,  often  incorporating  improvements  that  have  been  suggested 
in  the  exercises.  The  byte  size  for  these  runs  was  100. 

External  sorting  techniques  are  different  from  internal  sorting,  because  they 
must  use  comparatively  primitive  data  structures,  and  because  there  is  a great 
emphasis  on  minimizing  their  input /output  time.  Section  5.4.6  summarizes  the 
interesting  methods  that  have  been  developed  for  tape  merging,  and  Section  5.4.9 
discusses  the  use  of  disks  and  drums. 

Of  course,  sorting  isn’t  the  whole  story.  While  studying  all  of  these  sorting 
techniques,  we  have  learned  a good  deal  about  how  to  handle  data  structures, 
how  to  deal  with  external  memories,  and  how  to  analyze  algorithms;  and  perhaps 
we  have  even  learned  a little  about  how  to  discover  new  algorithms. 

Early  developments.  A search  for  the  origin  of  today’s  sorting  techniques 
takes  us  back  to  the  nineteenth  century,  when  the  first  machines  for  sorting 
were  invented.  The  United  States  conducts  a census  of  all  its  citizens  every  ten 
years,  and  by  1880  the  problem  of  processing  the  voluminous  census  data  was 
becoming  very  acute;  in  fact,  the  total  number  of  single  (as  opposed  to  married) 
people  was  never  tabulated  that  year,  although  the  necessary  information  had 
been  gathered.  Herman  Hollerith,  a 20-year-old  employee  of  the  Census  Bureau, 
devised  an  ingenious  electric  tabulating  machine  to  meet  the  need  for  better 
statistics-gathering,  and  about  100  of  his  machines  were  successfully  used  to 
tabulate  the  1890  census  rolls. 

Figure  94  shows  Hollerith’s  original  battery-driven  apparatus;  of  chief  inter- 
est to  us  is  the  “sorting  box”  at  the  right,  which  has  been  opened  to  show  half  of 
the  26  inner  compartments.  The  operator  would  insert  a 6§"  x 3 punched  card 
into  the  “press”  and  lower  the  handle;  this  caused  spring-actuated  pins  in  the 
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upper  plate  to  make  contact  with  pools  of  mercury  in  the  lower  plate,  wherever 
a hole  was  punched  in  the  card.  The  corresponding  completed  circuits  would 
cause  associated  dials  on  the  panel  to  advance  by  one  unit;  and  furthermore, 
one  of  the  26  lids  of  the  sorting  box  would  pop  open.  At  this  point  the  operator 
would  reopen  the  press,  put  the  card  into  the  open  compartment,  and  close  the 
lid.  One  man  reportedly  ran  19071  cards  through  this  machine  in  a single  61- 
hour  working  day,  an  average  of  about  49  cards  per  minute!  (A  typical  operator 
would  work  at  about  one-third  this  speed.) 


Fig.  94.  Hollerith’s  original  tabulating  and  sorting  machine.  (Photo  courtesy  of  IBM 
archives.) 


Population  continued  its  inexorable  growth,  and  the  original  tabulator- 
sorters  were  not  fast  enough  to  handle  the  1900  census;  so  Hollerith  devised 
another  machine  to  stave  off  another  data  processing  crisis.  His  new  device 
(patented  in  1901  and  1904)  had  an  automatic  card  feed,  and  in  fact  it  looked 
essentially  like  modern  card  sorters.  The  story  of  Hollerith’s  early  machines 
has  been  told  in  interesting  detail  by  Leon  E.  Truesdell,  The  Development  of 
Punch  Card  Tabulation  (Washington:  U.S.  Bureau  of  the  Census,  1965);  see  also 
the  contemporary  accounts  in  Columbia  College  School  of  Mines  Quarterly  10 
(1889),  238-255;  J.  Franklin  Inst.  129  (1890),  300—306;  The  Electrical  Engineer 
12  (November  11,  1891),  521-530;  J.  Amer.  Statistical  Assn.  2 (1891),  330- 
341,  4 (1895),  365;  J.  Royal  Statistical  Soc.  55  (1892),  326-327;  AUgemeines 
statistisches  Archiv  2 (1892),  78-126;  J.  Soc.  Statistique  de  Paris  33  (1892), 
87-96;  U.S.  Patents  395781  (1889),  685608  (1901),  777209  (1904).  Hollerith  and 
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another  former  Census  Bureau  employee,  James  Powers,  went  on  to  found  rival 
companies  that  eventually  became  part  of  IBM  and  Remington  Rand  corpora- 
tions, respectively. 

Hollerith’s  sorting  machine  is,  of  course,  the  basis  for  radix  sorting  methods 
now  used  in  digital  computers.  His  patent  mentions  that  two-column  numerical 
items  are  to  be  sorted  “separately  for  each  column,”  but  he  didn’t  say  whether 
the  units  or  the  tens  columns  should  be  considered  first.  Patent  number  518240 
by  John  K.  Gore  in  1894,  which  described  another  early  machine  for  sorting 
cards,  suggested  starting  with  the  tens  column.  The  nonobvious  trick  of  using 
the  units  column  first  was  presumably  discovered  by  some  anonymous  machine 
operator  and  passed  on  to  others  (see  Section  5.2.5);  it  appears  in  the  earliest 
extant  IBM  sorter  manual  (1936).  The  first  known  mention  of  this  right-to-left 
technique  is  in  a book  by  Robert  Feindler,  Das  Hollerith-Lochkarten-Verfahren 
(Berlin:  Reimar  Hobbing,  1929),  126-130;  it  was  also  mentioned  at  about  the 
same  time  in  an  article  by  L.  J.  Comrie,  Transactions  of  the  Office  Machinery 
Users’  Association  (London:  1929-1930),  25-37.  Incidentally,  Comrie  was  the 
first  person  to  make  the  important  observation  that  tabulating  machines  could 
fruitfully  be  employed  in  scientific  calculations,  even  though  they  were  originally 
designed  for  statistical  and  accounting  applications.  His  article  is  especially 
interesting  because  it  gives  a detailed  description  of  the  tabulating  equipment 
available  in  England  in  1930.  Sorting  machines  at  that  time  processed  360  to 
400  cards  per  minute,  and  could  be  rented  for  £9  per  month. 

The  idea  of  merging  goes  back  to  another  card-walloping  machine,  the 
collator,  which  was  a much  later  invention  (1936).  With  its  two  feeding  stations, 
it  could  merge  two  sorted  decks  of  cards  into  one,  in  only  one  pass;  the  technique 
for  doing  this  was  clearly  explained  in  the  first  IBM  collator  manual  (April  1939). 
[See  Ralph  E.  Page,  U.S.  Patent  2359670  (1944).] 

Then  computers  arrived  on  the  scene,  and  sorting  was  intimately  involved 
in  this  development;  in  fact,  there  is  evidence  that  a sorting  routine  was  the 
first  program  ever  written  for  a stored-program  computer.  The  designers  of 
EDVAC  were  especially  interested  in  sorting,  because  it  epitomized  the  potential 
nonnumerical  applications  of  computers;  they  realized  that  a satisfactory  order 
code  should  not  only  be  capable  of  expressing  programs  for  the  solution  of  differ- 
ence equations,  it  must  also  have  enough  flexibility  to  handle  the  combinatorial 
“decision-making”  aspects  of  algorithms.  John  von  Neumann  therefore  prepared 
programs  for  internal  merge  sorting  in  1945,  in  order  to  test  the  adequacy  of  some 
instruction  codes  he  was  proposing  for  the  EDVAC  computer.  The  existence 
of  efficient  special-purpose  sorting  machines  provided  a natural  standard  by 
which  the  merits  of  his  proposed  computer  organization  could  be  evaluated. 
Details  of  this  interesting  development  have  been  described  in  an  article  by  D.  E. 
Knuth,  Computing  Surveys  2 (1970),  247-260;  see  also  von  Neumann’s  Collected 
Works  5 (New  York:  Macmillan,  1963),  196—214,  for  the  final  polished  form  of 
his  original  sorting  programs. 

In  Germany,  K.  Zuse  independently  constructed  a program  for  straight  inser- 
tion sorting  in  1945,  as  one  of  the  simplest  examples  of  linear  list  operations  in  his 
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“Plankalkiil”  language.  (This  pioneering  work  remained  unpublished  for  nearly 
30  years;  see  Berichte  der  Gesellschaft  fur  Mathematik  und  Datenverarbeitung 
63  (Bonn:  1972),  part  4,  84-85.) 

The  limited  internal  memory  size  planned  for  early  computers  made  it 
natural  to  think  of  external  sorting  as  well  as  internal  sorting,  and  a “Progress 
Report  on  the  EDVAC”  prepared  by  J.  P.  Eckert  and  J.  W.  Mauchly  of  the 
Moore  School  of  Electrical  Engineering  (30  September  1945)  pointed  out  that 
a computer  augmented  with  magnetic  wire  or  tape  devices  could  simulate  the 
operations  of  card  equipment,  achieving  a faster  sorting  speed.  This  progress 
report  described  balanced  two-way  radix  sorting,  and  balanced  two-way  merging 
(called  “collating”),  using  four  magnetic  wire  or  tape  units,  reading  or  writing 
“at  least  5000  pulses  per  second.” 

John  Mauchly  lectured  on  “Sorting  and  Collating”  at  the  special  session 
on  computing  presented  at  the  Moore  School  in  1946,  and  the  notes  of  his 
lecture  constitute  the  first  published  discussion  of  computer  sorting  [ Theory  and 
Techniques  for  the  Design  of  Electronic  Digital  Computers,  edited  by  G.  W. 
Patterson,  3 (1946),  22.1-22.20].  Mauchly  began  his  presentation  with  an  inter- 
esting remark:  “To  ask  that  a single  machine  combine  the  abilities  to  compute 
and  to  sort  might  seem  like  asking  that  a single  device  be  able  to  perform  both 
as  a can  opener  and  a fountain  pen.”  Then  he  observed  that  machines  capable  of 
carrying  out  sophisticated  mathematical  procedures  must  also  have  the  ability 
to  sort  and  classify  data,  and  he  showed  that  sorting  may  even  be  useful  in 
connection  with  numerical  calculations.  He  described  straight  insertion  and 
binary  insertion,  observing  that  the  former  method  uses  about  N2/4  comparisons 
on  the  average,  while  the  latter  never  needs  more  than  about  NlgN.  Yet  binary 
insertion  requires  a rather  complex  data  structure,  and  he  went  on  to  show 
that  two-way  merging  achieves  the  same  low  number  of  comparisons  using  only 
sequential  accessing  of  lists.  The  last  half  of  his  lecture  notes  were  devoted  to  a 
discussion  of  partial-pass  radix  sorting  methods  that  simulate  digital  card  sorting 
on  four  tapes,  using  fewer  than  four  passes  per  digit  (see  Section  5.4.7). 

Shortly  afterwards,  Eckert  and  Mauchly  started  a company  that  produced 
some  of  the  earliest  electronic  computers,  the  BIN  AC  (for  military  applications) 
and  the  UNIVAC  (for  commercial  applications).  Again  the  U.S.  Census  Bureau 
played  a part  in  this  development,  receiving  the  first  UNIVAC.  At  this  time  it 
was  not  at  all  clear  that  computers  would  be  economically  profitable;  computing 
machines  could  sort  faster  than  card  equipment,  but  they  cost  more.  Therefore 
the  UNIVAC  programmers,  led  by  Frances  E.  Snyder,  put  considerable  effort 
into  the  design  of  high-speed  external  sorting  routines,  and  their  preliminary 
programs  also  influenced  the  hardware  design.  According  to  their  estimates,  100 
million  10-word  records  could  be  sorted  on  UNIVAC  in  9000  hours,  or  375  days. 

UNIVAC  I,  officially  dedicated  in  July  1951,  had  an  internal  memory  of  1000 
12-character  (72-bit)  words.  It  was  designed  to  read  and  write  60-word  blocks 
on  tapes,  at  a rate  of  500  words  per  second;  reading  could  be  either  forward 
or  backward,  and  simultaneous  reading,  writing,  and  computing  was  possible. 
In  1948,  Snyder  devised  an  interesting  way  to  do  two-way  merging  with  perfect 


5.5 


SUMMARY,  HISTORY,  AND  BIBLIOGRAPHY  387 


overlap  of  reading,  writing,  and  computing,  using  six  input  buffers:  Let  there  be 
one  “current  buffer”  and  two  “auxiliary  buffers”  for  each  input  file;  it  is  possible 
to  merge  in  such  a way  that,  whenever  it  is  time  to  output  one  block,  the  two 
current  input  buffers  contain  a total  of  exactly  one  block’s  worth  of  unprocessed 
records.  Therefore  exactly  one  input  buffer  becomes  empty  while  each  output 
block  is  being  formed,  and  we  can  arrange  to  have  three  of  the  four  auxiliary 
buffers  full  at  all  times  while  we  are  reading  into  the  other.  This  method  is 
slightly  faster  than  the  forecasting  method  of  Algorithm  5.4. 6F,  since  it  is  not 
necessary  to  inspect  the  result  of  one  input  before  initiating  the  next.  [See 
Collation  Methods  for  the  UNIVAC  System  (Eckert  -Mauchly  Computer  Corp., 
1950),  2 volumes.] 

The  culmination  of  this  work  was  a sort  generator  program,  which  was  the 
first  major  software  routine  ever  developed  for  automatic  programming.  The  user 
would  specify  the  record  size,  the  positions  of  up  to  five  keys  in  partial  fields  of 
each  record,  and  the  sentinel  keys  that  mark  file’s  end;  then  the  sort  generator 
would  produce  a copyrighted  sorting  program  for  one-reel  files.  The  first  pass 
of  this  program  was  an  internal  sort  of  60-word  blocks,  using  comparison  count- 
ing (Algorithm  5.2C);  then  came  a number  of  balanced  two-way  merge  passes, 
reading  backwards  and  avoiding  tape  interlock  as  described  above.  [See  “Master 
Generating  Routine  for  2-way  Sorting”  (Eckert-Mauchly  Division  of  Remington 
Rand,  1952);  the  first  draft  of  this  report  was  entitled  “Master  Prefabrication 
Routine  for  2-way  Collation.”  See  also  Frances  E.  [Snyder]  Holberton,  Sympo- 
sium on  Automatic  Programming  (Office  of  Naval  Research,  1954),  34-39.] 

By  1952,  many  approaches  to  internal  sorting  were  well  known  in  the  pro- 
gramming folklore,  but  comparatively  little  theory  had  been  developed.  Daniel 
Goldenberg  [“Time  analyses  of  various  methods  of  sorting  data,”  Digital  Compu- 
ter Laboratory  memo  M-1680  (Mass.  Inst,  of  Tech.,  17  October  1952)]  coded  five 
different  methods  for  the  Whirlwind  computer,  and  made  best-case  and  worst- 
case  analyses  of  each  program.  When  sorting  one  hundred  15-bit  words  on  an 
8-bit  key,  he  found  that  the  fastest  method  was  to  use  a 256-word  table,  storing 
each  record  into  a unique  position  corresponding  to  its  key,  then  compressing  the 
table.  But  this  technique  had  an  obvious  disadvantage,  since  it  would  eliminate  a 
record  whenever  a subsequent  one  had  the  same  key.  The  other  four  methods  he 
analyzed  were  ranked  as  follows:  Straight  two-way  merging  beat  radix-2  sorting 
beat  straight  selection  beat  bubble  sort. 

Goldenberg’s  results  were  extended  by  Harold  H.  Seward  in  his  1954  Master’s 
thesis  [“Information  sorting  in  the  application  of  electronic  digital  computers  to 
business  operations,”  Digital  Computer  Lab.  report  R-232  (Mass.  Inst,  of  Tech., 
24  May  1954;  60  pages)].  Seward  introduced  the  ideas  of  distribution  counting 
and  replacement  selection;  he  showed  that  the  first  run  in  a random  permutation 
has  an  average  length  of  e — 1;  and  he  analyzed  external  sorting  as  well  as  internal 
sorting,  on  various  types  of  bulk  memories  as  well  as  tapes. 

An  even  more  noteworthy  thesis  — a Ph.D.  thesis  in  fact  — was  written  by 
Howard  B.  Demuth  in  1956  [“Electronic  Data  Sorting”  (Stanford  University, 
October  1956),  92  pages;  IEEE  Trans.  C-34  (1985),  296-310],  This  work  helped 
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to  lay  the  foundations  of  computational  complexity  theory.  It  considered  three 
abstract  models  of  the  sorting  problem,  using  cyclic,  linear,  and  random-access 
memories;  and  optimal  or  near-optimal  methods  were  developed  for  each  model. 
(See  exercise  5.3.4—68.)  Although  no  practical  consequences  flowed  immediately 
from  Demuth’s  thesis,  it  established  important  ideas  about  how  to  link  theory 
with  practice. 

Thus  the  history  of  sorting  has  been  closely  associated  with  many  “firsts” 
in  computing:  the  first  data-processing  machines,  the  first  stored  programs,  the 
first  software,  the  first  buffering  methods,  the  first  work  on  algorithmic  analysis 
and  computational  complexity. 

None  of  the  computer-related  documents  mentioned  so  far  actually  appeared 
in  the  open  literature  ; in  fact,  most  of  the  early  history  of  computing  appears 
in  comparatively  inaccessible  reports,  because  comparatively  few  people  were 
involved  with  computers  at  the  time.  Literature  about  sorting  finally  broke  into 
print  in  1955—1956,  in  the  form  of  three  major  survey  articles. 

The  first  paper  was  prepared  by  J.  C.  Hosken  [Proc.  Eastern  Joint  Computer 
Conference  8 (1955),  39-55].  He  began  with  an  astute  observation:  “To  lower 
costs  per  unit  of  output,  people  usually  increase  the  size  of  their  operations.  But 
under  these  conditions,  the  unit  cost  of  sorting,  instead  of  falling,  rises.”  Hosken 
surveyed  all  the  available  special-purpose  equipment  then  being  marketed,  as 
well  as  the  methods  of  sorting  on  computers.  His  bibliography  of  54  items  was 
based  mostly  on  manufacturers’  brochures. 

The  comprehensive  paper  “Sorting  on  Electronic  Computer  Systems”  by 
E.  H.  Friend  [JACM  3 (1956),  134-168]  was  a major  milestone  in  the  devel- 
opment of  sorting.  Although  numerous  techniques  have  been  developed  since 
1956,  this  paper  is  still  remarkably  up-to-date  in  many  respects.  Friend  gave 
careful  descriptions  of  quite  a few  internal  and  external  sorting  algorithms 
and  he  paid  special  attention  to  buffering  techniques  and  the  characteristics 
of  magnetic  tape  units.  He  introduced  some  new  methods  (for  example,  tree 
selection,  amphisbaenic  sorting,  and  forecasting),  and  developed  some  of  the 
mathematical  properties  of  the  older  methods. 

The  third  survey  of  sorting  to  appear  about  this  time  was  prepared  by 
D.  W.  Davies  [Proc.  Inst.  Elect.  Engineers  103B,  Supplement  1 (1956),  87-93], 
In  the  following  years  several  other  notable  surveys  were  published,  by  D.  A.  Bell 
[Comp.  J.  1 (1958),  71-77];  A.  S.  Douglas  [Comp.  J.  2 (1959),  1-9];  D.  D.  Mc- 
Cracken, H.  Weiss,  and  T.  Lee  [Programming  Business  Computers  (New  York: 
Wiley,  1959),  Chapter  15,  pages  298-332];  I.  Flores  [JACM  8 (1961),  41-80]; 
K.  E.  Iverson  [A  Programming  Language  (New  York:  Wiley,  1962),  Chapter  6, 
176-245];  C.  C.  Gotlieb  [CACM  6 (1963),  194-201];  T.  N.  Hibbard  [CACM  6 
(1963),  206-213];  M.  A.  Goetz  [Digital  Computer  User’s  Handbook,  edited  by 
M.  Klerer  and  G.  A.  Korn  (New  York:  McGraw-Hill,  1967),  Chapter  1.10,  pages 
1.292-1.320],  A symposium  on  sorting  was  sponsored  by  ACM  in  November 
1962;  most  of  the  papers  presented  at  that  symposium  were  published  in  the 
May  1963  issue  of  CACM,  and  they  constitute  a good  representation  of  the  state 
of  the  art  at  that  time.  C.  C.  Gotlieb’s  survey  of  contemporary  sort  generators, 
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T.  N.  Hibbard’s  survey  of  minimal  storage  internal  sorting,  and  G.  U.  Hubbard’s 
early  exploration  of  disk  file  sorting  are  particularly  noteworthy  articles  in  this 
collection. 

New  sorting  methods  were  being  discovered  throughout  this  period:  Address 
calculation  (1956),  merge  insertion  (1959),  radix  exchange  (1959),  cascade  merge 
(1959),  shellsort  (1959),  polyphase  merge  (1960),  tree  insertion  (1960),  oscillating 
sort  (1962),  Hoare’s  quicksort  (1962),  Williams’s  heapsort  (1964),  Batcher’s 
merge  exchange  (1964).  The  history  of  each  individual  algorithm  has  been  traced 
in  the  particular  section  of  this  chapter  where  that  method  is  described.  The 
late  1960s  saw  an  intensive  development  of  the  corresponding  theory. 

A complete  bibliography  of  all  papers  on  sorting  examined  by  the  author 
as  this  chapter  was  first  being  written,  compiled  with  the  help  of  R.  L.  Rivest, 
appeared  in  Computing  Reviews  13  (1972),  283-289. 

Later  developments.  Dozens  of  sorting  algorithms  have  been  invented  since 
1970,  although  nearly  all  of  them  are  variations  on  earlier  themes.  Multikey 
quicksort,  which  is  discussed  in  the  answer  to  exercise  5.2.2-30,  is  an  excellent 
example  of  such  more  recent  methods. 

Another  trend,  primarily  of  theoretical  interest  so  far,  has  been  to  study 
sorting  schemes  that  are  adaptive,  in  the  sense  that  they  are  guaranteed  to 
run  faster  when  the  input  is  already  pretty  much  in  order  according  to  various 
criteria.  See,  for  example,  H.  Mannila,  IEEE  Transactions  C-34  (1985),  318- 
325;  V.  Estivill-Castro  and  D.  Wood,  Computing  Surveys  24  (1992),  441-476; 
C.  Levcopoulos  and  O.  Petersson,  Journal  of  Algorithms  14  (1993),  395-413; 
A.  Moffat,  G.  Eddy,  and  O.  Petersson,  Software  Practice  Sz  Experience  26  (1996), 
781-797. 

Changes  in  computer  hardware  have  prompted  many  interesting  studies 
of  the  efficiency  of  sorting  algorithms  when  the  cost  criteria  change;  see,  for 
example,  the  discussion  of  virtual  memory  in  exercise  5.4.9-20.  The  effect  of 
hardware  caches  on  internal  sorting  has  been  studied  by  A.  LaMarca  and  R.  E. 
Ladner,  J.  Algorithms  31  (1999),  66-104.  One  of  their  conclusions  is  that  step  Q9 
of  Algorithm  5.2.2Q  is  a bad  idea  on  modern  machines  (although  it  worked  well 
on  traditional  computers  like  MIX):  Instead  of  finishing  quicksort  with  a straight 
insertion  sort,  it  is  now  better  to  sort  the  short  subfiles  earlier,  while  their  keys 
are  still  in  the  cache. 

What  is  the  current  state  of  the  art  for  sorting  large  amounts  of  data?  One 
popular  benchmark  since  1985  has  been  the  task  of  sorting  one  million  100- 
character  records  that  have  uniformly  random  10-character  keys.  The  input  and 
output  are  supposed  to  reside  on  disk,  and  the  objective  is  to  minimize  the  total 
elapsed  time,  including  the  time  it  takes  to  launch  the  program.  R.  C.  Agarwal 
[SIGMOD  Record  25,2  (June  1996),  240-246]  used  a desktop  RISC  computer, 
the  IBM  RS/6000  model  39H,  to  implement  radix  sorting  with  files  that  were 
striped  on  8 disk  units,  and  he  finished  this  task  in  5.1  seconds.  Input/output  was 
the  main  bottleneck;  indeed,  the  processor  needed  only  0.6  seconds  to  control  the 
actual  sorting!  Even  faster  times  have  been  achieved  when  several  processors  are 
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available.  A network  of  32  UltraSPARC  I workstations,  each  with  two  internal 
disks,  can  sort  a million  records  in  2.41  seconds  using  a hybrid  method  called 
NOW-Sort  [A.  C.  Arpaci-Dusseau,  R.  H.  Arpaci-Dusseau,  D.  E.  Culler,  J.  M. 
Hellerstein,  and  D.  A.  Patterson,  SIGMOD  Record  26,2  (June  1997),  243-254], 

Such  advances  mean  that  the  million-record  benchmark  has  become  mostly 
a test  of  startup  and  shutdown  time;  larger  data  sets  are  needed  to  give  more 
meaningful  results.  For  example,  the  present  world  record  for  terabyte  sorting  — 
10  records  of  100  characters  each  — is  2.5  hours,  achieved  in  September  1997  on 
a Silicon  Graphics  0rigin2000  system  with  32  processors,  8 gigabytes  of  internal 
memory,  and  559  disks  of  4 gigabytes  each.  This  record  was  set  by  a commercially 
available  sorting  routine  called  Nsort™,  developed  by  C.  Nyberg,  C.  Koester,  and 
J.  Gray  using  methods  that  have  not  yet  been  published. 

Perhaps  even  the  terabyte  benchmark  will  be  considered  too  small  some  day. 
The  best  current  candidate  for  a benchmark  that  will  live  forever  is  MinuteS ort: 
How  many  100-character  records  can  be  sorted  in  60  seconds?  As  this  book 
went  to  press,  the  current  record  holder  for  this  task  was  NOW-Sort;  95  work- 
stations needed  only  59.21  seconds  to  put  90.25  million  records  into  order,  on  30 
March  1997.  But  present-day  methods  are  not  yet  pushing  up  against  any  truly 
fundamental  limitations  on  speed. 

In  summary,  the  problem  of  efficient  sorting  remains  just  as  fascinating  today 
as  it  ever  was. 

EXERCISES 

1.  [05]  Summarize  the  contents  of  this  chapter  by  stating  a generalization  of  Theo- 
rem 5.4.6A. 

2.  [20]  Based  on  the  information  in  Table  1,  what  is  the  best  list-sorting  method  for 
six-digit  keys,  for  use  on  the  MIX  computer? 

3.  [37]  ( Stable  sorting  in  minimum  storage .)  A sorting  algorithm  is  said  to  require 
minimum  storage  if  it  uses  only  0((log7V)2 3)  bits  of  memory  space  for  its  variables 
besides  the  space  needed  to  store  the  N records.  The  algorithm  must  be  general  in 
the  sense  that  it  works  for  all  N,  not  just  for  a particular  value  of  N,  assuming  that 
a sufficient  amount  of  random  access  memory  has  been  made  available  whenever  the 
algorithm  is  actually  called  upon  to  sort. 

Many  of  the  sorting  methods  we  have  studied  violate  this  minimum-storage  re- 
quirement; in  particular,  the  use  of  N link  fields  is  forbidden.  Quicksort  (Algorithm 
5.2.2Q)  satisfies  the  minimum-storage  requirement,  but  its  worst  case  running  time  is 
proportional  to  N2.  Heapsort  (Algorithm  5.2.3H)  is  the  only  0(N  log  N)  algorithm  we 
have  studied  that  uses  minimum  storage,  although  another  such  algorithm  could  be 
formulated  using  the  idea  of  exercise  5.2.4-18. 

The  fastest  general  algorithm  we  have  considered  that  sorts  keys  in  a stable  manner 
is  the  list  merge  sort  (Algorithm  5.2.4L),  but  it  does  not  use  minimum  storage.  In  fact, 
the  only  stable  minimum-storage  sorting  algorithms  we  have  seen  are  Q(N2)  methods 
(straight  insertion,  bubble  sorting,  and  a variant  of  straight  selection). 

Design  a stable  minimum-storage  sorting  algorithm  that  needs  only  0(N( log  N)2) 
units  of  time  in  its  worst  case.  [Hint:  It  is  possible  to  do  stable  minimum-storage  merg- 
ing—namely,  sorting  when  there  are  at  most  two  runs  — in  0(IV  log  N)  units  of  time.] 


5.5 


SUMMARY,  HISTORY,  AND  BIBLIOGRAPHY  391 


► 4.  [28]  A sorting  algorithm  is  called  parsimonious  if  it  makes  decisions  entirely  by 
comparing  keys,  and  if  it  never  makes  a comparison  whose  outcome  could  have  been 
predicted  from  the  results  of  previous  comparisons.  Which  of  the  methods  listed  in 
Table  1 are  parsimonious? 

5.  [46]  It  is  much  more  difficult  to  sort  nonrandom  data  with  numerous  equal  keys 
than  to  sort  uniformly  random  data.  Devise  a sorting  benchmark  that  (i)  is  interesting 
now  and  will  probably  be  interesting  100  years  from  now;  (ii)  does  not  involve  uniformly 
random  keys;  and  (iii)  does  not  use  data  sets  that  change  with  time. 


I shall  have  accomplished  my  purpose  if  I have  sorted  and  put  in  logical  order 
the  gist  of  the  great  volume  of  material  which  has  been  generated  about  sorting 

over  the  past  few  years. 

— J.  C.  HOSKEN  (1955) 


CHAPTER  SIX 


SEARCHING 


Let’s  look  at  the  record. 
— AL  SMITH  (1928) 


This  chapter  might  have  been  given  the  more  pretentious  title  “Storage  and 
Retrieval  of  Information” ; on  the  other  hand,  it  might  simply  have  been  called 
Table  Look-Up.”  We  are  concerned  with  the  process  of  collecting  information 
in  a computer  s memory,  in  such  a way  that  the  information  can  subsequently  be 
recovered  as  quickly  as  possible.  Sometimes  we  are  confronted  with  more  data 
than  we  can  really  use,  and  it  may  be  wisest  to  forget  and  to  destroy  most  of  it; 
but  at  other  times  it  is  important  to  retain  and  organize  the  given  facts  in  such 
a way  that  fast  retrieval  is  possible. 

Most  of  this  chapter  is  devoted  to  the  study  of  a very  simple  search  problem: 
how  to  find  the  data  that  has  been  stored  with  a given  identification.  For 
example,  in  a numerical  application  we  might  want  to  find  /( x),  given  x and 
a table  of  the  values  of  /;  in  a nonnumerical  application,  we  might  want  to  find 
the  English  translation  of  a given  Russian  word. 

In  general,  we  shall  suppose  that  a set  of  N records  has  been  stored,  and 
the  problem  is  to  locate  the  appropriate  one.  As  in  the  case  of  sorting,  we 
assume  that  each  record  includes  a special  field  called  its  key;  this  terminology 
is  especially  appropriate,  because  many  people  spend  a great  deal  of  time  every 
day  searching  for  their  keys.  We  generally  require  the  N keys  to  be  distinct,  so 
that  each  key  uniquely  identifies  its  record.  The  collection  of  all  records  is  called 
a table  or  file,  where  the  word  “table”  is  usually  used  to  indicate  a small  file, 
and  “file”  is  usually  used  to  indicate  a large  table.  A large  file  or  a group  of  files 
is  frequently  called  a database. 

Algorithms  for  searching  are  presented  with  a so-called  argument,  K,  and  the 
problem  is  to  find  which  record  has  K as  its  key.  After  the  search  is  complete, 
two  possibilities  can  arise:  Either  the  search  was  successful,  having  located  the 
unique  record  containing  K;  or  it  was  unsuccessful,  having  determined  that  K 
is  nowhere  to  be  found.  After  an  unsuccessful  search  it  is  sometime  desirable  to 
enter  a new  record,  containing  K,  into  the  table;  a method  that  does  this  is  called 
a search- and-insertion  algorithm.  Some  hardware  devices  known  as  associative 
memories  solve  the  search  problem  automatically,  in  a way  that  might  resemble 
the  functioning  of  a human  brain;  but  we  shall  study  techniques  for  searching 
on  a conventional  general-purpose  digital  computer. 

Although  the  goal  of  searching  is  to  find  the  information  stored  in  the  record 
associated  with  K,  the  algorithms  in  this  chapter  generally  ignore  everything  but 
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the  keys  themselves.  In  practice  we  can  find  the  associated  data  once  we  have 
located  K ; for  example,  if  K appears  in  location  TABLE  + i,  the  associated  data 
(or  a pointer  to  it)  might  be  in  location  TABLE  + * + 1,  or  in  DATA  + i,  etc.  It  is 
therefore  convenient  to  gloss  over  the  details  of  what  should  be  done  after  K has 
been  successfully  found. 

Searching  is  the  most  time-consuming  part  of  many  programs,  and  the 
substitution  of  a good  search  method  for  a bad  one  often  leads  to  a substantial 
increase  in  speed.  In  fact  we  can  often  arrange  the  data  or  the  data  structure 
so  that  searching  is  eliminated  entirely,  by  ensuring  that  we  always  know  just 
where  to  find  the  information  we  need.  Linked  memory  is  a common  way  to 
achieve  this;  for  example,  a doubly  linked  list  makes  it  unnecessary  to  search  for 
the  predecessor  or  successor  of  a given  item.  Another  way  to  avoid  searching 
occurs  if  we  are  allowed  to  choose  the  keys  freely,  since  we  might  as  well  let 
them  be  the  numbers  {1,2,...,  TV);  then  the  record  containing  K can  simply 
be  placed  in  location  TABLE  + K.  Both  of  these  techniques  were  used  to  elimi- 
nate searching  from  the  topological  sorting  algorithm  discussed  in  Section  2.2.3. 
However,  searches  would  have  been  necessary  if  the  objects  in  the  topological 
sorting  algorithm  had  been  given  symbolic  names  instead  of  numbers.  Efficient 
algorithms  for  searching  turn  out  to  be  quite  important  in  practice. 

Search  methods  can  be  classified  in  several  ways.  We  might  divide  them 
into  internal  versus  external  searching,  just  as  we  divided  the  sorting  algorithms 
of  Chapter  5 into  internal  versus  external  sorting.  Or  we  might  divide  search 
methods  into  static  versus  dynamic  searching,  where  “static”  means  that  the 
contents  of  the  table  are  essentially  unchanging  (so  that  it  is  important  to  min- 
imize the  search  time  without  regard  for  the  time  required  to  set  up  the  table), 
and  “dynamic”  means  that  the  table  is  subject  to  frequent  insertions  and  perhaps 
also  deletions.  A third  possible  scheme  is  to  classify  search  methods  according  to 
whether  they  are  based  on  comparisons  between  keys  or  on  digital  properties  of 
the  keys,  analogous  to  the  distinction  between  sorting  by  comparison  and  sorting 
by  distribution.  Finally  we  might  divide  searching  into  those  methods  that  use 
the  actual  keys  and  those  that  work  with  transformed  keys. 

The  organization  of  this  chapter  is  essentially  a combination  of  the  latter  two 
modes  of  classification.  Section  6.1  considers  “brute  force”  sequential  methods  of 
search,  then  Section  6.2  discusses  the  improvements  that  can  be  made  based  on 
comparisons  between  keys,  using  alphabetic  or  numeric  order  to  govern  the  deci- 
sions. Section  6.3  treats  digital  searching,  and  Section  6.4  discusses  an  important 
class  of  methods  called  hashing  techniques,  based  on  arithmetic  transformations 
of  the  actual  keys.  Each  of  these  sections  treats  both  internal  and  external 
searching,  in  both  the  static  and  the  dynamic  case;  and  each  section  points  out 
the  relative  advantages  and  disadvantages  of  the  various  algorithms. 

Searching  and  sorting  are  often  closely  related  to  each  other.  For  example, 
consider  the  following  problem:  Given  two  sets  of  numbers,  A = {aj,  02, . . . , am} 
and  B = {bi,  b%, . . . , bn},  determine  whether  or  not  A C B.  Three  solutions 
suggest  themselves: 
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1.  Compare  each  at  sequentially  with  the  6/s  until  finding  a match. 

2.  Sort  the  a’s  and  6’s,  then  make  one  sequential  pass  through  both  files, 
checking  the  appropriate  condition. 

3.  Enter  the  6/s  in  a table,  then  search  for  each  of  the  a,. 

Each  of  these  solutions  is  attractive  for  a different  range  of  values  of  m and  n. 
Solution  1 will  take  roughly  Cimn  units  of  time,  for  some  constant  a,  and 
solution  2 will  take  about  C2  (to  lg  m + n lg  n)  units,  for  some  (larger)  constant  c.-i . 
With  a suitable  hashing  method,  solution  3 will  take  roughly  c3m  + c4n  units  of 
time,  for  some  (still  larger)  constants  c3  and  c4.  It  follows  that  solution  1 is  good 
for  very  small  to  and  n,  but  solution  2 soon  becomes  better  as  m and  n grow 
larger.  Eventually  solution  3 becomes  preferable,  until  n exceeds  the  internal 
memory  size;  then  solution  2 is  usually  again  superior  until  n gets  much  larger 
still.  Thus  we  have  a situation  where  sorting  is  sometimes  a good  substitute  for 
searching,  and  searching  is  sometimes  a good  substitute  for  sorting. 

More  complicated  search  problems  can  often  be  reduced  to  the  simpler  case 
considered  here.  For  example,  suppose  that  the  keys  are  words  that  might  be 
slightly  misspelled;  we  might  want  to  find  the  correct  record  in  spite  of  this 
error.  If  we  make  two  copies  of  the  file,  one  in  which  the  keys  are  in  normal 
lexicographic  order  and  another  in  which  they  are  ordered  from  right  to  left  (as 
if  the  words  were  spelled  backwards),  a misspelled  search  argument  will  probably 
agree  up  to  half  or  more  of  its  length  with  an  entry  in  one  of  these  two  files.  The 
search  methods  of  Sections  6.2  and  6.3  can  therefore  be  adapted  to  find  the  key 
that  was  probably  intended. 

A related  problem  has  received  considerable  attention  in  connection  with 
airline  reservation  systems,  and  in  other  applications  involving  people’s  names 
when  there  is  a good  chance  that  the  name  will  be  misspelled  due  to  poor 
handwriting  or  voice  transmission.  The  goal  is  to  transform  the  argument  into 
some  code  that  tends  to  bring  together  all  variants  of  the  same  name.  The 
following  contemporary  form  of  the  “Soundex”  method,  a technique  that  was 
originally  developed  by  Margaret  K.  Odell  and  Robert  C.  Russell  [see  U.S. 
Patents  1261167  (1918),  1435663  (1922)],  has  often  been  used  for  encoding 
surnames: 

1.  Retain  the  first  letter  of  the  name,  and  drop  all  occurrences  of  a,  e,  h,  i,  o, 
u,  w,  y in  other  positions. 

2.  Assign  the  following  numbers  to  the  remaining  letters  after  the  first: 

b,  f,  p,  v ->  1 1 4 

c,  g,  j,  k,  q,  s,  x,  z -A  2 m,  n -»  5 

d,  t — >•  3 r A 6 

3.  If  two  or  more  letters  with  the  same  code  were  adjacent  in  the  original  name 
(before  step  1),  or  adjacent  except  for  intervening  h’s  and  w’s,  omit  all  but 
the  first. 

4.  Convert  to  the  form  “letter,  digit,  digit,  digit”  by  adding  trailing  zeros  (if 
there  are  less  than  three  digits),  or  by  dropping  rightmost  digits  (if  there 
are  more  than  three). 
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For  example,  the  names  Euler,  Gauss,  Hilbert,  Knuth,  Lloyd,  Lukasiewicz,  and 
Wachs  have  the  respective  codes  E460,  G200,  H416,  K530,  L300,  L222,  W200. 
Of  course  this  system  will  bring  together  names  that  are  somewhat  different, 
as  well  as  names  that  are  similar;  the  same  seven  codes  would  be  obtained  for 
Ellery,  Ghosh,  Heilbronn,  Kant,  Liddy,  Lissajous,  and  Waugh.  And  on  the  other 
hand  a few  related  names  like  Rogers  and  Rodgers,  or  Sinclair  and  St.  Clair,  or 
Tchebysheff  and  Chebyshev,  remain  separate.  But  by  and  large  the  Soundex 
code  greatly  increases  the  chance  of  finding  a name  in  one  of  its  disguises.  [For 
further  information,  see  C.  P.  Bourne  and  D.  F.  Ford,  JACM  8 (1961),  538- 
552;  Leon  Davidson,  CACM  5 (1962),  169-171;  Federal  Population  Censuses 
1790-1890  (Washington,  D.C.:  National  Archives,  1971),  90.] 

When  using  a scheme  like  Soundex,  we  need  not  give  up  the  assumption 
that  all  keys  are  distinct;  we  can  make  lists  of  all  records  with  equivalent  codes, 
treating  each  list  as  a unit. 

Large  databases  tend  to  make  the  retrieval  process  more  complex,  since 
people  often  want  to  consider  many  different  fields  of  each  record  as  potential 
keys,  with  the  ability  to  locate  items  when  only  part  of  the  key  information  is 
specified.  For  example,  given  a large  file  about  stage  performers,  a producer 
might  wish  to  find  all  unemployed  actresses  between  25  and  30  with  dancing 
talent  and  a French  accent;  given  a large  file  of  baseball  statistics,  a sportswriter 
may  wish  to  determine  the  total  number  of  runs  scored  by  the  Chicago  White 
Sox  in  1964,  during  the  seventh  inning  of  night  games,  against  left-handed 
pitchers.  Given  a large  file  of  data  about  anything,  people  like  to  ask  arbitrarily 
complicated  questions.  Indeed,  we  might  consider  an  entire  library  as  a database, 
and  a searcher  may  want  to  find  everything  that  has  been  published  about 
information  retrieval.  An  introduction  to  the  techniques  for  such  secondary  key 
(multi-attribute)  retrieval  problems  appears  below  in  Section  6.5. 

Before  entering  into  a detailed  study  of  searching,  it  may  be  helpful  to  put 
things  in  historical  perspective.  During  the  pre-computer  era,  many  books  of 
logarithm  tables,  trigonometry  tables,  etc.,  were  compiled,  so  that  mathematical 
calculations  could  be  replaced  by  searching.  Eventually  these  tables  were  trans- 
ferred to  punched  cards,  and  used  for  scientific  problems  in  connection  with 
collators,  sorters,  and  duplicating  punch  machines.  But  when  stored-program 
computers  were  introduced,  it  soon  became  apparent  that  it  was  now  cheaper  to 
recompute  logo;  or  cos  a:  each  time,  instead  of  looking  up  the  answer  in  a table. 

Although  the  problem  of  sorting  received  considerable  attention  already  in 
the  earliest  days  of  computers,  comparatively  little  was  done  about  algorithms 
for  searching.  With  small  internal  memories,  and  with  nothing  but  sequential 
media  like  tapes  for  storing  large  files,  searching  was  either  trivially  easy  or 
almost  impossible. 

But  the  development  of  larger  and  larger  random-access  memories  during 
the  1950s  eventually  led  to  the  recognition  that  searching  was  an  interesting 
problem  in  its  own  right.  After  years  of  complaining  about  the  limited  amounts 
of  space  in  the  early  machines,  programmers  were  suddenly  confronted  with 
larger  amounts  of  memory  than  they  knew  how  to  use  efficiently. 
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The  first  surveys  of  the  searching  problem  were  published  by  A.  I.  Dumey, 
Computers  <fc  Automation  5,12  (December  1956),  6-9;  W.  W.  Peterson,  IBM 
J.  Research  <fc  Development  1 (1957),  130-146;  A.  D.  Booth,  Information  and 
Control  1 (1958),  159-164;  A.  S.  Douglas,  Comp.  J.  2 (1959),  1-9.  More 
extensive  treatments  were  given  later  by  Kenneth  E.  Iverson,  A Programming 
Language  (New  York:  Wiley,  1962),  133-158,  and  by  Werner  Buchholz,  IBM 
Systems  J.  2 (1963),  86-111. 

During  the  early  1960s,  a number  of  interesting  new  search  procedures  based 
on  tree  structures  were  introduced,  as  we  shall  see;  and  research  about  searching 
is  still  actively  continuing  at  the  present  time. 


6.1.  SEQUENTIAL  SEARCHING 

“Begin  at  the  beginning,  and  go  on  till  you  find  the  right  key;  then  stop.” 
This  sequential  procedure  is  the  obvious  way  to  search,  and  it  makes  a useful 
starting  point  for  our  discussion  of  searching  because  many  of  the  more  intricate 
algorithms  are  based  on  it.  We  shall  see  that  sequential  searching  involves  some 
very  interesting  ideas,  in  spite  of  its  simplicity. 

The  algorithm  might  be  formulated  more  precisely  as  follows: 

Algorithm  S ( Sequential  search).  Given  a table  of  records  Ri,  R2, . . . , Rn, 
whose  respective  keys  are  K±,  K^,  ■ ■ ■ , Kn,  this  algorithm  searches  for  a given 
argument  K.  We  assume  that  N > 1. 

51.  [Initialize.]  Set  i 1. 

52.  [Compare.]  If  K = Ki,  the  algorithm  terminates  successfully. 

53.  [Advance.]  Increase  i by  1. 

54.  [End  of  file?]  If  i < N,  go  back  to  S2.  Otherwise  the  algorithm  terminates 
unsuccessfully.  | 

Notice  that  this  algorithm  can  terminate  in  two  different  ways,  successfully 
(having  located  the  desired  key)  or  unsuccessfully  (having  established  that  the 
given  argument  is  not  present  in  the  table).  The  same  will  be  true  of  most  other 
algorithms  in  this  chapter. 


SUCCESS  FAILURE 


Fig.  1.  Sequential  or  “house-to-house”  search. 
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A MIX  program  can  be  written  down  immediately. 

Program  S ( Sequential  search).  Assume  that  Ki  appears  in  location  KEY  + i, 
and  that  the  remainder  of  record  Ri  appears  in  location  INFO  + i.  The  following 


program  uses 

rA  = 

K,  rll  ee  i 

- N. 

01 

START 

LDA 

K 

1 

SI.  Initialize. 

02 

ENT1 

1-N 

1 

i <-  1. 

03 

2H 

CMPA 

KEY+N , 1 

C 

S2.  Compare. 

04 

JE 

SUCCESS 

C 

Exit  if  K = Ki. 

05 

INC1 

1 

c-s 

S3.  Advance. 

06 

J1NP 

2B 

c-s 

S4.  End  of  file? 

07 

FAILURE  EQU 

* 

1 - s 

Exit  if  not  in  table. 

At  location  SUCCESS,  the  instruction  “LDA  INF0+N.1”  will  now  bring  the  desired 
information  into  rA.  | 

The  analysis  of  this  program  is  straightforward;  it  shows  that  the  running 
time  of  Algorithm  S depends  on  two  things, 

C = the  number  of  key  comparisons; 

5 = 1 if  successful,  0 if  unsuccessful.  (l) 

Program  S takes  5C  — 25  + 3 units  of  time.  If  the  search  successfully  finds 
K = Ki,  we  have  C = i,  S — 1;  hence  the  total  time  is  (5 i + l)u.  On  the  other 
hand  if  the  search  is  unsuccessful,  we  have  C = N,  S = 0,  for  a total  time  of 
(5 N + 3 )u.  If  every  input  key  occurs  with  equal  probability,  the  average  value 
of  C in  a successful  search  will  be 

1 + 2 + \-  N N+  1 

N ~ 2 ’ ^ 

the  standard  deviation  is,  of  course,  rather  large,  about  0.289IV  (see  exercise  1). 

The  algorithm  above  is  surely  familiar  to  all  programmers.  But  too  few 
people  know  that  it  is  not  always  the  right  way  to  do  a sequential  search!  A 
straightforward  change  makes  the  algorithm  faster,  unless  the  list  of  records  is 
quite  short: 

Algorithm  Q ( Quick  sequential  search).  This  algorithm  is  the  same  as  Algo- 
rithm S,  except  that  it  assumes  the  presence  of  a dummy  record  Rn+i  at  the 
end  of  the  file. 

Ql.  [Initialize.]  Set  i <—  1,  and  set  K^+i  K. 

Q2.  [Compare.]  If  K — Ki,  go  to  Q4. 

Q3.  [Advance.]  Increase  i by  1 and  return  to  Q2. 

Q4.  [End  of  file?]  If  i < N,  the  algorithm  terminates  successfully;  otherwise  it 
terminates  unsuccessfully  (i  = N + 1).  | 

Program  Q ( Quick  sequential  search).  rA  = K,  rll  = * — N. 

Ql.  Initialize. 

Kn+i  <-  K. 


01  START  LDA  K 

02  STA  KEY+N+1 


1 

1 
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03 

ENT1 

-N 

1 

i <—  0. 

04 

INC1 

1 

C + l-5 

Q3.  Advance. 

05 

CMPA 

KEY+N.l 

C+l-5 

0.2.  Comnare. 

06 

JNE 

*-2 

C+l-5 

To  Q3  if  Ki  / K. 

07 

J1NP 

SUCCESS 

1 

Q4.  End  of  file? 

08 

FAILURE  EqU 

* 

1-5 

Exit  if  not  in  table. 

In  terms  of  the  quantities  C and  5 in  the  analysis  of  Program  S,  the  running 
time  has  decreased  to  (4C  — 45  + 10)  it;  this  is  an  improvement  whenever  C > 6 
in  a successful  search,  and  whenever  N > 8 in  an  unsuccessful  search. 

The  transition  from  Algorithm  S to  Algorithm  Q makes  use  of  an  impor- 
tant speed-up  principle:  When  an  inner  loop  of  a program  tests  two  or  more 
conditions,  we  should  try  to  reduce  the  testing  to  just  one  condition. 

Another  technique  will  make  Program  Q still  faster. 

Program  Q'  ( Quicker  sequential  search ).  rA  = K,  rll  = i — N. 


01 

START 

LDA 

K 

1 

Ql.  Initialize. 

02 

STA 

KEY+N+1 

1 

Kn+i  <—  K. 

03 

ENT1 

-1-N 

1 

i < 1. 

04 

3H 

INC1 

2 

L(C  — 5 + 2)/2J 

Q3.  Advance,  (twice  1 

05 

CMPA 

KEY+N.l 

L(C-5  + 2)/2j 

Q2.  Compare. 

06 

JE 

4F 

L(C-S  + 2)/2j 

To  Q4  if  K = Ki. 

07 

CMPA 

KEY+N+1, 1 

L(C-S  + 1)/2J 

0.2.  Compare,  (nexti 

08 

JNE 

3B 

L(C-5  + l)/2j 

To  Q3  if  K ^ Ki+1. 

09 

INC1 

1 

(C  — 5)  mod  2 

Advance  i. 

10 

4H 

J1NP 

SUCCESS 

1 

Q4.  End  of  file? 

11 

FAILURE  EqU 

* 

1-5 

Exit  if  not  in  table. 

The  inner  loop  has  been  duplicated;  this  avoids  about  half  of  the  + 1” 

instructions,  so  it  reduces  the  running  time  to 

3.5 C - 3.55  + 10  + (C  ~ S ) mod  2 

2 

units.  We  have  saved  30  percent  of  the  running  time  of  Program  S,  when  large 
tables  are  being  searched;  many  existing  programs  can  be  improved  in  this  way. 
The  same  ideas  apply  to  programming  in  high-level  languages.  [See,  for  example, 
D.  E.  Knuth,  Computing  Surveys  6 (1974),  266-269.] 

A slight  variation  of  the  algorithm  is  appropriate  if  we  know  that  the  keys 
are  in  increasing  order: 

Algorithm  T (Sequential  search  in  ordered  table ).  Given  a table  of  records 
Ri,R2,  , Rn  whose  keys  are  in  increasing  order  Kx  < K2  < < KN, 

this  algorithm  searches  for  a given  argument  K.  For  convenience  and  speed, 
the  algorithm  assumes  that  there  is  a dummy  record  FI _\  + i whose  key  value  is 
Kn+i  = oo  > K. 

Tl.  [Initialize.]  Set  i +-  1. 

T2.  [Compare.]  If  K < AT,,  go  to  T4. 

T3.  [Advance.]  Increase  i by  1 and  return  to  T2. 
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T4.  [Equality?]  If  K — Ki,  the  algorithm  terminates  successfully.  Otherwise  it 
terminates  unsuccessfully.  | 

If  all  input  keys  are  equally  likely,  this  algorithm  takes  essentially  the  same 
average  time  as  Algorithm  Q,  for  a successful  search.  But  unsuccessful  searches 
are  performed  about  twice  as  fast,  since  the  absence  of  a record  can  be  established 
more  quickly. 

Each  of  the  algorithms  above  uses  subscripts  to  denote  the  table  entries.  It 
is  convenient  to  describe  the  methods  in  terms  of  these  subscripts,  but  the  same 
search  procedures  can  be  used  for  tables  that  have  a linked  representation,  since 
the  data  is  being  traversed  sequentially.  (See  exercises  2,  3,  and  4.) 

Frequency  of  access.  So  far  we  have  been  assuming  that  every  argument  occurs 
as  often  as  every  other.  This  is  not  always  a realistic  assumption;  in  a general 
situation,  key  Kj  will  occur  with  probability  pj,  where  Pi  + P2  + ■ ■ ■ + Pn  = 1- 
The  time  required  to  do  a successful  search  is  essentially  proportional  to  the 
number  of  comparisons,  C,  which  now  has  the  average  value 

CN=px+2p2-\ f NpN.  (3) 

If  we  have  the  option  of  putting  the  records  into  the  table  in  any  desired  order, 
this  quantity  CV  is  smallest  when 

Pi  >P2  > >Pn,  (4) 

that  is,  when  the  most  frequently  used  records  appear  near  the  beginning. 

Let’s  look  at  several  probability  distributions,  in  order  to  see  how  much  of  a 
saving  is  possible  when  the  records  are  arranged  in  the  optimal  manner  specified 
in  (4).  If  pi  = P2  = • • ■ = Pn  = 1 /N,  formula  (3)  reduces  to  C m = (N  + l)/2; 
we  have  already  derived  this  in  Eq.  (2).  Suppose,  on  the  other  hand,  that 

- 1 1 1 1 
Pi  - 2>  P2  - 4’  •••’  fbv-i  = Pn=2N^i-  (s) 

Then  Cn  = 2 — 21~N , by  exercise  7;  the  average  number  of  comparisons  is 
less  than  two , for  this  distribution,  if  the  records  appear  in  the  proper  order 
within  the  table. 

Another  probability  distribution  that  suggests  itself  is 

2 

Pi=Nc,  p2  — (N  — l)c,  ...,  pjv  = c,  where  c = + ■ (6) 

This  wedge-shaped  distribution  is  not  as  dramatic  a departure  from  uniformity 
as  (5).  In  this  case  we  find 

N 

CN  = cJ2k(N  + l-k)  = ^l-,  (7) 

k= 1 

the  optimum  arrangement  saves  about  one-third  of  the  search  time  that  would 
have  been  obtained  if  the  records  had  appeared  in  random  order. 
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Of  course  the  probability  distributions  in  (5)  and  (6)  are  rather  artificial, 
and  they  may  never  be  a very  good  approximation  to  reality.  A more  typical 
sequence  of  probabilities,  called  “Zipf’s  law,”  has 


Pi  = c/1,  P2  = c/2,  ...,  pN  = c/N,  where  c = l/H^.  (8) 


This  distribution  was  popularized  by  G.  K.  Zipf,  who  observed  that  the  nth  most 
common  word  in  natural  language  text  seems  to  occur  with  a frequency  approx- 
imately proportional  to  1/n.  [The  Psycho-Biology  of  Language  (Boston,  Mass.: 
Houghton  Mifflin,  1935);  Human  Behavior  and  the  Principle  of  Least  Effort 
(Reading,  Mass.:  Addison- Wesley,  1949).]  He  observed  the  same  phenomenon 
in  census  tables,  when  metropolitan  areas  are  ranked  in  order  of  decreasing 
population.  If  Zipf’s  law  governs  the  frequency  of  the  keys  in  a table,  we  have 
immediately 

CN  = N/Hn;  (9) 

searching  such  a file  is  about  | In  A times  faster  than  searching  the  same  file 
with  randomly  ordered  records.  [See  A.  D.  Booth,  L.  Brandwood,  and  J.  P. 
Cleave,  Mechanical  Resolution  of  Linguistic  Problems  (New  York:  Academic 
Press,  1958),  79.] 

Another  approximation  to  realistic  distributions  is  the  “80-20”  rule  of  thumb 
that  has  commonly  been  observed  in  commercial  applications  [see,  for  example, 
W.  P.  Heising,  IBM  Systems  J.  2 (1963),  114-115].  This  rule  states  that  80  per- 
cent of  the  transactions  deal  with  the  most  active  20  percent  of  a file;  and  the 
same  rule  applies  in  fractal  fashion  to  the  top  20  percent,  so  that  64  percent  of 
the  transactions  deal  with  the  most  active  4 percent,  etc.  In  other  words, 


Pi  + P2  + • • • + p.  20n 
P1+P2+P3I \-Pn 


for  all  n. 


(10) 


One  distribution  that  satisfies  this  rule  exactly  whenever  n is  a multiple  of  5 is 


P1=C,  P2  = (2e-l)c,  P3  — (3e—2e)c,  ..., 

Pn  = (Ae-(A-l)e)c, 

(11) 

where 

c-l/JV*,  6 = [°8  80  j 

' log  .20 

« 0.1386, 

(12) 

since  Pi  + P2  + • • • + pn  — cn&  for  all  n in  this  case.  It  is  not  especially  easy 
to  work  with  the  probabilities  in  (11);  we  have,  however,  n6  — (n  — l)9  = 
On?-1  (l  + 0(l/n))  , so  there  is  a simpler  distribution  that  approximately  fulfills 
the  80-20  rule,  namely 


Pi—  o/l1  6 > P2  — C/21  e,  ...,  Pat  — c/N1  6 , where  c=l/H^  0\  (13) 

Here  0 = log  .80/ log  .20  as  before,  and  is  the  Ath  harmonic  number  of 
order  s,  namely  l~s  + 2~s  + ■ ■ ■ + A~s.  Notice  that  this  probability  distribution 
is  very  similar  to  that  of  Zipf’s  law  (8);  as  0 varies  from  1 to  0,  the  probabilities 
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vary  from  a uniform  distribution  to  a Zipfian  one.  Applying  (3)  to  (13)  yields 
CN  = Hke)/H$-e>  = ^ + 0(N1~e)  « 0.1221V  (14) 

as  the  mean  number  of  comparisons  for  the  80-20  law  (see  exercise  8). 

A study  of  word  frequencies  carried  out  by  E.  S.  Schwartz  [see  the  interesting 
graph  on  page  422  of  JACM  10  (1963)]  suggests  that  distribution  (13)  with  a 
slightly  negative  value  of  9 gives  a better  fit  to  the  data  than  Zipf’s  law  (8).  In 
this  case  the  mean  value 

(-) 

is  substantially  smaller  than  (9)  as  N — ► 00. 

Distributions  like  (11)  and  (13)  were  first  studied  by  Vilfredo  Pareto  in 
connection  with  disparities  of  personal  income  and  wealth  [Cours  d’Economie 
Politique  2 (Lausanne:  Rouge,  1897),  304-312].  If  pk  is  proportional  to  the 
wealth  of  the  fcth  richest  individual,  the  probability  that  a person’s  wealth 
exceeds  or  equals  x times  the  wealth  of  the  poorest  individual  is  k/N  when 
x = Pk/PN ■ Thus,  when  pk  = cke~l  and  x = (k/N)8-1,  the  stated  probability 
is  x 1/(1-0);  this  is  now  called  a Pareto  distribution  with  parameter  1/(1  — 9). 

Curiously,  Pareto  didn’t  understand  his  own  distribution;  he  believed  that 
a value  of  9 near  0 would  correspond  to  a more  egalitarian  society  than  a 
value  near  1!  His  error  was  corrected  by  Corrado  Gini  [Atti  della  III  Riunione 
della  Societa  Italiana  per  il  Progresso  delle  Scienze  (1910),  reprinted  in  his 
Memorie  di  Metodologia  Statistica  1 (Rome:  1955),  3-120],  who  was  the  first 
person  to  formulate  and  explain  the  significance  of  ratios  like  the  80-20  law  (10). 
People  still  tend  to  misunderstand  such  distributions;  they  often  speak  about  a 
“75-25  law”  or  a “90-10  law”  as  if  an  a-b  law  makes  sense  only  when  a + b = 100, 
while  (12)  shows  that  the  sum  80  + 20  is  quite  irrelevant. 

Another  discrete  distribution  analogous  to  (11)  and  (13)  was  introduced  by 
G.  Udny  Yule  when  he  studied  the  increase  in  biological  species  as  a function  of 
time,  assuming  various  models  of  evolution  [Philos.  Trans.  B213  (1924),  21-87], 
Yule’s  distribution  applies  when  9 < 2: 

c 2c  (iV  — 1)!  c 

Pi  c,P2  2 -9,P3~  (3-0)(2-0)’  PN  ~ (N-9)...(2-9) 

c=  0 (V) 

1-0  i-eyy 

The  limiting  value  c = 1 /Hpr  or  c — l/N  is  used  when  9 = 0 or  9 = 1 

A “self-organizing”  file.  These  calculations  with  probabilities  are  very  nice, 
but  in  most  cases  we  don’t  know  what  the  probabilities  are.  We  could  keep  a 
count  in  each  record  of  how  often  it  has  been  accessed,  reallocating  the  records  on 
the  basis  of  those  counts;  the  formulas  derived  above  suggest  that  this  procedure 
would  often  lead  to  a worthwhile  savings.  But  we  probably  don’t  want  to  devote 


(N-0\  > 
\N- 1) 

(16) 
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so  much  memory  space  to  the  count  fields,  since  we  can  make  better  use  of  that 
memory  by  using  one  of  the  nonsequential  search  techniques  that  are  explained 
later  in  this  chapter. 

A simple  scheme,  which  has  been  in  use  for  many  years  although  its  origin 
is  unknown,  can  be  used  to  keep  the  records  in  a pretty  good  order  without 
auxiliary  count  fields:  Whenever  a record  has  been  successfully  located,  it  is 
moved  to  the  front  of  the  table. 

The  idea  behind  this  “self-organizing”  technique  is  that  the  oft-used  items 
will  tend  to  be  located  fairly  near  the  beginning  of  the  table,  when  we  need  them. 
If  we  assume  that  the  N keys  occur  with  respective  probabilities  {pi,P2, . . . ,Pn}, 
with  each  search  being  completely  independent  of  previous  searches,  it  can  be 
shown  that  the  average  number  of  comparisons  needed  to  find  an  item  in  such  a 
self-organizing  file  tends  to  the  limiting  value 


Cjv  — 1 + 2 ^ ] 

l<i<j<N 


PiPj 
Pi  + Pj 


1 PiPj 

2 Tt  Pi  + pj 


(17) 


(See  exercise  11.)  For  example,  if  pi  = 1/N  for  1 < i < N,  the  self-organizing 
table  is  always  in  completely  random  order,  and  this  formula  reduces  to  the 
familiar  expression  ( N + l)/2  derived  above.  In  general,  the  average  number  of 
comparisons  (17)  is  always  less  than  twice  the  optimal  value  (3),  since  Cn  < 
1 + 2JE=1(,?  — 1 )pj  = 2Cjv  — 1.  In  fact,  Cn  is  always  less  than  7r/2  times  the 
optimal  value  Cn  [Chung,  Hajela,  and  Seymour,  J.  Comp.  Syst.  Sci.  36  (1988), 
148-157];  this  ratio  is  the  best  possible  constant  in  general,  since  it  is  approached 
when  pj  is  proportional  to  l/j2. 

Let  us  see  how  well  the  self-organizing  procedure  works  when  the  key  prob- 
abilities obey  Zipf’s  law  (8).  We  have 


CN  — 


1 

2 


+ E 

l<i,j<N 


{c/i)(c/j ) 
c/i  + c/j 


1 

2 


E 

l<i,j<N 


1 

i+j 


1 N 1 2 N N 

= 5 + cJ^(HN+i  - Hi)  = - + cJ^Hi  - 2c£  Hi 

i= 1 i= 1 i—  1 

= \ + c((2N  + 1)H2N  -2 N-  2 (AT  + 1 )HN  + 2 N) 

= | +c(Aln4-lnAT  + 0(l))  « 2N/lgN,  (18) 

by  Eqs.  1.2.7-(8)  and  1.2.7-(3).  This  is  substantially  better  than  | IV,  when  N 
is  reasonably  large,  and  it  is  only  about  In  4 fa  1.386  times  as  many  comparisons 
as  would  be  obtained  in  the  optimum  arrangement;  see  (9). 

Computational  experiments  involving  actual  compiler  symbol  tables  indicate 
that  the  self-organizing  method  works  even  better  than  our  formulas  predict, 
because  successive  searches  are  not  independent  (small  groups  of  keys  tend  to 
occur  in  bunches). 

This  self-organizing  scheme  was  first  analyzed  by  John  McCabe  [Operations 
Research  13  (1965),  609-618],  who  established  (17).  McCabe  also  introduced 
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another  interesting  scheme,  under  which  each  successfully  located  key  that  is  not 
already  at  the  beginning  of  the  table  is  simply  interchanged  with  the  preceding 
key,  instead  of  being  moved  all  the  way  to  the  front.  He  conjectured  that  the 
limiting  average  search  time  for  this  method,  assuming  independent  searches, 
never  exceeds  (17).  Several  years  later,  Ronald  L.  Rivest  proved  in  fact  that  the 
transposition  method  uses  strictly  fewer  comparisons  than  the  move-to-front 
method,  in  the  long  run,  except  of  course  when  N < 2 or  when  all  the  nonzero 
probabilities  are  equal  [CACM  19  (1976),  63-67],  However,  convergence  to  the 
asymptotic  limit  is  much  slower  than  for  the  move-to-front  heuristic,  so  move-to- 
front  is  better  unless  the  process  is  prolonged  [J.  R.  Bitner,  SICOMP  8 (1979), 
82-110],  Moreover,  J.  L.  Bentley,  C.  C.  McGeoch,  D.  D.  Sleator,  and  R.  E,' 
Tarjan  have  proved  that  the  move-to-front  method  never  makes  more  than  four 
times  the  total  number  of  memory  accesses  made  by  any  algorithm  on  linear 
lists,  given  any  sequence  of  accesses  whatever  to  the  data  — even  if  the  algorithm 
knows  the  future;  the  frequency-count  and  transposition  methods  do  not  have 
this  property  [CACM  28  (1985),  202-208,  404-411],  See  SODA  8 (1997),  53-62, 
for  an  interesting  empirical  study  of  more  than  40  heuristics  for  self-organizing 
lists,  carried  out  by  R.  Bachrach  and  R.  El-Yaniv. 

Tape  searching  with  unequal-length  records.  Now  let’s  give  the  problem 
still  another  twist:  Suppose  the  table  we  are  searching  is  stored  on  tape,  and 
the  individual  records  have  varying  lengths.  For  example,  in  an  old-fashioned 
operating  system,  the  “system  library  tape”  was  such  a file;  standard  system 
programs  such  as  compilers,  assemblers,  loading  routines,  and  report  generators 
were  the  records  on  this  tape,  and  most  user  jobs  would  start  by  searching 
down  the  tape  until  the  appropriate  routine  had  been  input.  This  setup  makes 
our  previous  analysis  of  Algorithm  S inapplicable,  since  step  S3  takes  a variable 
amount  of  time  each  time  we  reach  it.  The  number  of  comparisons  is  therefore 
not  the  only  criterion  of  interest. 

Let  L,  be  the  length  of  record  R{,  and  let  pt  be  the  probability  that  this 
record  will  be  sought.  The  average  running  time  of  the  search  method  will  now 
be  approximately  proportional  to 

PiLi  +P2(L1+L2)  -f  • • • +pN(L1  +L2  + L3  + 1 ~Ln).  (19) 

When  Li  = L2  = ■ • • = LN  = 1,  this  reduces  to  (3),  the  case  already  studied. 

It  seems  logical  to  put  the  most  frequently  needed  records  at  the  beginning 
of  the  tape;  but  this  is  sometimes  a bad  idea!  For  example,  assume  that  the  tape 
contains  just  two  programs,  A and  B,  where  A is  needed  twice  as  often  as  B but 
it  is  four  times  as  long.  Thus, 

N = 2>  PA  = b LA  = 4,  PB  = b Lg  = 1. 

If  we  place  A first  on  tape,  according  to  the  “logical”  principle  stated  above,  the 
average  running  time  is  f -4  + ± -5  = f ; but  if  we  use  an  “illogical”  idea,  placing 
B first,  the  average  running  time  is  reduced  to|l  + |-  5=  ii. 

The  optimum  arrangement  of  programs  on  a library  tape  may  be  determined 
as  follows. 
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Theorem  S.  Let  Li  and  pi  be  as  defined  above.  The  arrangement  of  records 
in  the  table  is  optimal  if  and  only  if 

Pi/Li  > P2/L2  > • • ■ > Pn/Ln-  (20) 

In  other  words,  the  minimum  value  of 

PaiLa  1 +Pa2(Lai  + La2)  + • • • +PaN{La  1 H h LaN), 

over  all  permutations  a\  of  {1,2,...,  TV},  is  equal  to  (19)  if  and  only  if 

(20)  holds. 

Proof.  Suppose  that  Ri  and  Ri+i  are  interchanged  on  the  tape;  the  cost  (19) 
changes  from 

■ • • + Pi{L\  + ■ ■ ■ + Li- 1 + Li)  + pi+i(L\  + ■ ■ ■ + Li+ 1)  + • • • 
to 

■ ■ ' + Pi+i(Li  + ■ ■ ■ 4-  Li- 1 + Lj+i)  + Pi(Li  + • • • + Li+ 1)  + • • • , 

a net  change  of  PiLi+ 1 - pi+iLi.  Therefore  if  Pi/Li  < pl+1/Lt+1,  such  an 
interchange  will  improve  the  average  running  time,  and  the  given  arrangement 
is  not  optimal.  It  follows  that  (20)  holds  in  any  optimal  arrangement. 

Conversely,  assume  that  (20)  holds;  we  need  to  prove  that  the  arrangement 
is  optimal.  The  argument  just  given  shows  that  the  arrangement  is  “locally 
optimal”  in  the  sense  that  adjacent  interchanges  make  no  improvement;  but  there 
may  conceivably  be  a long,  complicated  sequence  of  interchanges  that  leads  to  a 
better  “global  optimum.”  We  shall  consider  two  proofs,  one  that  uses  computer 
science  and  one  that  uses  a mathematical  trick. 

First  proof.  Assume  that  (20)  holds.  We  know  that  any  permutation  of  the 
records  can  be  sorted  into  the  order  R\  R2  . . . Rn  by  using  a sequence  of  inter- 
changes of  adjacent  records.  Each  of  these  interchanges  replaces  . . . RjRi ...  by 
. . . RiRj  ...  for  some  i < j,  so  it  decreases  the  search  time  by  the  nonnegative 
amount  p,Lj  — PjLi.  Therefore  the  order  R\  R2  ■ ■ ■ Rn  must  have  minimum 
search  time. 

Second  proof.  Replace  each  probability  pi  by 

Pi(e)  = Pi  + P - (e1  + e2  + • • • + eN)/N,  (21) 

where  e is  an  extremely  small  positive  number.  When  e is  sufficiently  small,  we 

will  never  have  xipi(e)-| \-xNpN(e)  = yiPi(e)-\ \-VnPn( e)  unless  xx  = yu 

...,  xjv  = Vn\  in  particular,  equality  will  not  hold  in  (20).  Consider  now  the 
IV!  permutations  of  the  records;  at  least  one  of  them  is  optimum,  and  we  know 
that  it  satisfies  (20).  But  only  one  permutation  satisfies  (20)  because  there  are 
no  equalities.  Therefore  (20)  uniquely  characterizes  the  optimum  arrangement 
of  records  in  the  table  for  the  probabilities  Pi(e),  whenever  e is  sufficiently  small. 
By  continuity,  the  same  arrangement  must  also  be  optimum  when  e is  set  equal 
to  zero.  (This  “tie-breaking”  type  of  proof  is  often  useful  in  connection  with 
combinatorial  optimization.)  | 
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Theorem  S is  due  to  W.  E.  Smith,  Naval  Research  Logistics  Quarterly  3 
(1956),  59-66.  The  exercises  below  contain  further  results  about  optimum  file 
arrangements. 

EXERCISES 

1.  [ M20 ] When  all  the  search  keys  are  equally  probable,  what  is  the  standard  devi- 
ation of  the  number  of  comparisons  made  in  a successful  sequential  search  through  a 
table  of  N records? 

2.  [15]  Restate  the  steps  of  Algorithm  S,  using  linked-memory  notation  instead  of 
subscript  notation.  (If  P points  to  a record  in  the  table,  assume  that  KEY(P)  is  the  key, 
INFO(P)  is  the  associated  information,  and  LINK(P)  is  a pointer  to  the  next  record. 
Assume  also  that  FIRST  points  to  the  first  record,  and  that  the  last  record  points  to  A.) 

3.  [16]  Write  a MIX  program  for  the  algorithm  of  exercise  2.  What  is  the  running 
time  of  your  program,  in  terms  of  the  quantities  C and  S in  (l)? 

► 4.  [17]  Does  the  idea  of  Algorithm  Q carry  over  from  subscript  notation  to  linked- 
memory  notation?  (See  exercise  2.) 

5.  [20]  Program  Q'  is,  of  course,  noticeably  faster  than  Program  Q,  when  C is  large. 
But  are  there  any  small  values  of  C and  S for  which  Program  Q'  actually  takes  more 
time  than  Program  Q? 

► 6.  [20]  Add  three  more  instructions  to  Program  Q',  reducing  its  running  time  to 
about  (3.33C  + constant) u. 

7.  [M20]  Evaluate  the  average  number  of  comparisons,  (3),  using  the  “binary”  prob- 
ability distribution  (5). 

8.  [HM22]  Find  an  asymptotic  series  for  as  n A 00,  when  r/  1. 

► 9.  [HM28]  The  text  observes  that  the  probability  distributions  given  by  (11),  (13), 
and  (16)  are  roughly  equivalent  when  0 < 6 < 1,  and  that  the  mean  number  of 
comparisons  using  (13)  is  ^_AT  + 0(Nl~e). 

a)  Is  the  mean  number  of  comparisons  equal  to  ^ N + 0(N1~e)  also  when  the 
probabilities  of  (11)  are  used? 

b)  What  about  (16)? 

c)  How  do  (11)  and  (16)  compare  to  (13)  when  0 < 0? 

10.  [M20]  The  best  arrangement  of  records  in  a sequential  table  is  specified  by  (4); 
what  is  the  worst  arrangement?  Show  that  the  average  number  of  comparisons  in  the 
worst  arrangement  has  a simple  relation  to  the  average  number  of  comparisons  in  the 
best  arrangement. 

11.  [MSO]  The  purpose  of  this  exercise  is  to  analyze  the  limiting  behavior  of  a self- 

organizing  file  with  the  move-to-front  heuristic.  First  we  need  to  define  some  notation: 
Let  /m(ii,i2,...,im)  be  the  infinite  sum  of  all  distinct  ordered  products  xtl  xi2  . . . xtk 
such  that  1 < ti, . . . , < m,  where  each  of  *1,  *2, . . . , xm  appears  in  every  term.  For 

example, 

h(x, y)=J2  (x1+jy(*  + y)k  + y1+j*(*  + y)k)  = 7 xy  ( + ^~) . 

1-x-y  \l-x  1-yJ 
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Given  a set  X of  n variables  {xj, . . . , xn  },  let 
Pnm  — } 1 fm(Xj1  , . . . , Xjm);  Qnm  — ^ ' . 

1<J1  <"'<jm<n  l<jl<-<jm<n  ~'''~X3m 

For  example,  P32  = f2(x1,x2)  + f2(xx,x3)  + f2(x2,x3)  and  Q32  = 1/(1  - Xl  - x2)  + 
1/(1  — xi  — X3)  + 1/(1  — x2  — x3).  By  convention  we  set  Pn0  = Qn0  = 1. 

a)  Assume  that  the  text’s  self-organizing  file  has  been  servicing  requests  for  item  Rt 
with  probability  pi.  After  the  system  has  been  running  a long  time,  show  that 
Ri  will  be  the  mth  item  from  the  front  with  limiting  probability  PiP(N-i)(m-i), 
where  the  set  of  variables  X is  (px, . . . ,pi_x,Pi+i, . . . ,Pn}- 

b)  By  summing  the  result  of  (a)  for  m = 1,  2,  ... , we  obtain  the  identity 

Pnn  + Pn(n-l)  + ‘ ' • + PnO  = Qnn- 
Prove  that,  consequently, 

^nm+y  1 )P„(m- 1)  -I h ^ m J PnO  — Qnm> 

o - (n~m+l\n  _i_  ,/  Umfn-m  + m\ 

Vnm  ^ 1 JQn(m- 1)  + f(-l)  y m )QnO  = Pnm- 

c)  Compute  the  limiting  average  distance  di  = Y.m>i  mPiP(N- i)(m-i)  of  Rx  from 
the  front  of  the  list;  then  evaluate  CN  = J2iLiPidi- 

12.  [MSS]  Use  (17)  to  evaluate  the  average  number  of  comparisons  needed  to  search 
the  self-organizing  file  when  the  search  keys  have  the  binary  probability  distribution  (5). 

13.  [M27]  Use  (17)  to  evaluate  Cn  for  the  wedge-shaped  probability  distribution  (6). 

14.  [M21]  Given  two  sequences  (xi,  x2, . . . , x„)  and  (j/i,  y2, . . . , yn)  of  real  numbers, 
what  permutation  or  a2  . . . an  of  the  subscripts  will  make  ]T\  Xiyai  a maximum?  What 
permutation  will  make  it  a minimum? 

► 15.  [M22]  The  text  shows  how  to  arrange  programs  optimally  on  a system  library 
tape,  when  only  one  program  is  being  sought.  But  another  set  of  assumptions  is  more 
appropriate  for  a subroutine  library  tape,  from  which  we  may  wish  to  load  various 
subroutines  called  for  in  a user’s  program. 

For  this  case  let  us  suppose  that  subroutine  j is  desired  with  probability  Pj, 
independently  of  whether  or  not  other  subroutines  are  desired.  Then,  for  example, 
the  probability  that  no  subroutines  at  all  are  needed  is  (1  - Pi)(l  — P2) ...  (1  - PN)\ 
and  the  probability  that  the  search  will  end  just  after  loading  the  jth  subroutine  is 
^ J (1  — Pj+i)  • • • (1  ~ Pn )•  If  Lj  is  the  length  of  subroutine  j,  the  average  search  time 
will  therefore  be  essentially  proportional  to 


LiPi(l  P2) ...  (1  — PN ) + (Lx  + L2)P2  (1  - P3) ...  (1  - PN)  -| f (Lx  -1 1 -Ln)Pn. 

What  is  the  optimum  arrangement  of  subroutines  on  the  tape,  under  these  assump- 
tions? 


16.  [M2 2]  (H.  Riesel.)  We  often  need  to  test  whether  or  not  n given  conditions  are 
all  simultaneously  true.  (For  example,  we  may  want  to  test  whether  both  x > 0 and 
y < z2,  and  it  is  not  immediately  clear  which  condition  should  be  tested  first.)  Suppose 
that  the  testing  of  condition  j costs  Tj  units  of  time,  and  that  the  condition  will  be 
true  with  probability  pj,  independent  of  the  outcomes  of  all  the  other  conditions.  In 
what  order  should  we  make  the  tests? 
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Fig.  2.  An  “organ-pipe  arrangement”  of  probabilities  minimizes  the  average  seek  time 
in  a catenated  search. 

17.  [ M23 ] (J.  R.  Jackson.)  Suppose  you  have  to  do  n jobs;  the  jth  job  takes  T,  units 
of  time,  and  it  has  a deadline  Dj.  In  other  words,  the  jth  job  is  supposed  to  be  finished 
after  at  most  Dj  units  of  time  have  elapsed.  What  schedule  ai  a2  . . . a„  for  processing 
the  jobs  will  minimize  the  maximum  tardiness,  namely 

max(T01  -Dai , T0l  +T02  Tai  +Ta2  +■■■  +Ta„  - £>„„)? 

18.  [ M30 ] ( Catenated  search.)  Suppose  that  N records  are  located  in  a linear  array 
Ri  ■ ■ ■ Rn,  with  probability  pj  that  record  Rj  will  be  sought.  A search  process  is  called 
“catenated”  if  each  search  begins  where  the  last  one  left  off.  If  consecutive  searches 
are  independent,  the  average  time  required  will  be  ^21<itj<N  PiPjd(i,j),  where  d(i,j) 
represents  the  amount  of  time  to  do  a search  that  starts  at  position  i and  ends  at 
position  j.  This  model  can  be  applied,  for  example,  to  disk  file  seek  time,  if  d(i,j)  is 
the  time  needed  to  travel  from  cylinder  i to  cylinder  j. 

The  object  of  this  exercise  is  to  characterize  the  optimum  placement  of  records  for 
catenated  searches,  whenever  d(i,j)  is  an  increasing  function  of  | i — j |,  that  is,  whenever 
we  have  d(i,j)  = d|j_j|  for  d\  < d2  < ■ ■ ■ < djv-i.  (The  value  of  do  is  irrelevant.)  Prove 
that  in  this  case  the  records  are  optimally  placed,  among  all  AT!  permutations,  if  and 
only  if  either  p\  < Pn  < P2  < Pn- i < • • • < P[iv/2j-i-i  or  Pn  < Pi  < pjv-i  < P2  < 
■ < P\N/2]  • (Thus,  an  “organ-pipe  arrangement”  of  probabilities  is  best,  as  shown 
in  Fig.  2.)  Hint:  Consider  any  arrangement  where  the  respective  probabilities  are 
qi  92  • • -qk  srk  ■ ■ -r2  n ti . . . tm,  for  some  m > 0 and  k > 0;  N = 2fc  + m+  1.  Show  that 
the  rearrangement  q[  q'2  . . . q'k  s r'k  . . . r'2  r[  tj  . . . tm  is  better,  where  q[  = min  (qit  r,)  and 
r'  = max  [q,  , r,),  except  when  q[  = qt  and  r\  = r,  for  all  i or  when  q[  = r\  and  r'  = qt 
and  tj  = 0 for  all  i and  j.  The  same  holds  true  when  s is  not  present  and  N = 2 k + m. 

19.  [M20]  Continuing  exercise  18,  what  are  the  optimal  arrangements  for  catenated 
searches  when  the  function  d(i,j)  has  the  property  that  d(i,j)  + d(j,i)  = c for  all 
i 7^  j ? [This  situation  occurs,  for  example,  on  tapes  without  read-backwards  capability, 
when  we  do  not  know  the  appropriate  direction  to  search;  for  i < j we  have,  say, 

d(i,j)  = a + b(Li+i- 1 b Lj)  and  d(j,i)  = a + b(Lj+ H bTjv)  + r + fe(LiH b Lt), 

where  r is  the  rewind  time.] 

20.  [ M28]  Continuing  exercise  18,  what  are  the  optimal  arrangements  for  catenated 
searches  when  the  function  d(i,j)  is  min(d|,_j| , d„_|1_:J|),  for  dj  < d2  < •••?  [This 
situation  occurs,  for  example,  in  a two-way  linked  circular  list,  or  in  a two-way  shift- 
register  storage  device.] 
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21.  [ M28 ] Consider  an  n-dimensional  cube  whose  vertices  have  coordinates  (di,. . ,,dn) 
with  dj  = 0 or  1;  two  vertices  are  called  adjacent  if  they  differ  in  exactly  one  coordinate. 
Suppose  that  a set  of  2n  numbers  x0  < xx  < ■ ■ ■ < x2~-i  is  to  be  assigned  to  the  2" 
vertices  in  such  a way  that  J2i,j  \xi  ~ xj\  is  minimized,  where  the  sum  is  over  all  i and  j 
such  that  Xi  and  Xj  have  been  assigned  to  adjacent  vertices.  Prove  that  this  minimum 
will  be  achieved  if,  for  all  j,  Xj  is  assigned  to  the  vertex  whose  coordinates  are  the 
binary  representation  of  j. 

► 22.  [20]  Suppose  you  want  to  search  a large  file,  not  for  equality  but  to  find  the  1000 
records  that  are  closest  to  a given  key,  in  the  sense  that  these  1000  records  have  the 
smallest  values  of  d(Kj,K)  for  some  given  distance  function  d.  What  data  structure  is 
most  appropriate  for  such  a sequential  search? 


Attempt  the  end,  and  never  stand  to  doubt; 
Nothing's  so  hard,  but  search  will  find  it  out. 

— ROBERT  HERRICK,  Seeke  and  finde  (1648) 
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6.2.  SEARCHING  BY  COMPARISON  OF  KEYS 

In  THIS  SECTION  we  shall  discuss  search  methods  that  are  based  on  a linear 
ordering  of  the  keys,  such  as  alphabetic  order  or  numeric  order.  After  comparing 
the  given  argument  A'  to  a key  AT,  in  the  table,  the  search  continues  in  three 
different  ways,  depending  on  whether  K < Ki,  K = A-,,  or  A"  > K,.  The 
sequential  search  methods  of  Section  6.1  were  essentially  limited  to  a two-way 
decision  (K  = Ki  versus  K ^ Ki),  but  if  we  free  ourselves  from  the  restriction 
of  sequential  access  we  are  able  to  make  effective  use  of  an  order  relation. 

6.2.1.  Searching  an  Ordered  Table 

What  would  you  do  if  someone  handed  you  a large  telephone  directory  and 
told  you  to  find  the  name  of  the  person  whose  number  is  795-6841?  There  is 
no  better  way  to  tackle  this  problem  than  to  use  the  sequential  methods  of 
Section  6.1.  (Well,  you  might  try  to  dial  the  number  and  talk  to  the  person  who 
answers;  or  you  might  know  how  to  obtain  a special  directory  that  is  sorted  by 
number  instead  of  by  name.)  The  point  is  that  it  is  much  easier  to  find  an  entry 
by  the  party’s  name,  instead  of  by  number,  although  the  telephone  directory 
contains  all  the  information  necessary  in  both  cases.  When  a large  file  must 
be  searched,  sequential  scanning  is  almost  out  of  the  question,  but  an  ordering 
relation  simplifies  the  job  enormously. 

With  so  many  sorting  methods  at  our  disposal  (Chapter  5),  we  will  have  little 
difficulty  rearranging  a file  into  order  so  that  it  may  be  searched  conveniently. 
Of  course,  if  we  need  to  search  the  table  only  once,  a sequential  search  would 
be  faster  than  to  do  a complete  sort  of  the  file;  but  if  we  need  to  make  repeated 
searches  in  the  same  file,  we  are  better  off  having  it  in  order.  Therefore  in  this 
section  we  shall  concentrate  on  methods  that  are  appropriate  for  searching  a 
table  whose  keys  satisfy 

K\  < K2  < • • • < Kn, 

assuming  that  we  can  easily  access  the  key  in  any  given  position.  After  comparing 
K to  Ki  in  such  a table,  we  have  either 

• K < Ki  [Ri,  R-i+i,  • • • > Rn  are  eliminated  from  consideration]; 

or  • AT  = Ki  [the  search  is  done]; 

or  • K > Ki  [i?i,  i?2,  • • ■ ,Ri  are  eliminated  from  consideration]. 

In  each  of  these  three  cases,  substantial  progress  has  been  made,  unless  i is 
near  one  of  the  ends  of  the  table;  this  is  why  the  ordering  leads  to  an  efficient 
algorithm. 

Binary  search.  Perhaps  the  first  such  method  that  suggests  itself  is  to  start  by 
comparing  K to  the  middle  key  in  the  table;  the  result  of  this  probe  tells  which 
half  of  the  table  should  be  searched  next,  and  the  same  procedure  can  be  used 
again,  comparing  K to  the  middle  key  of  the  selected  half,  etc.  After  at  most 
about  lg  N comparisons,  we  will  have  found  the  key  or  we  will  have  established 
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SUCCESS 

Fig.  3.  Binary  search. 


that  it  is  not  present.  This  procedure  is  sometimes  known  as  “logarithmic  search” 
or  “bisection,”  but  it  is  most  commonly  called  binary  search. 

Although  the  basic  idea  of  binary  search  is  comparatively  straightforward, 
the  details  can  be  surprisingly  tricky,  and  many  good  programmers  have  done  it 
wrong  the  first  few  times  they  tried.  One  of  the  most  popular  correct  forms  of 
the  algorithm  makes  use  of  two  pointers,  l and  u,  that  indicate  the  current  lower 
and  upper  limits  for  the  search,  as  follows: 

Algorithm  B ( Binary  search).  Given  a table  of  records  Ri,  R2,  . . . , Rn  whose 
keys  are  in  increasing  order  Ki  < K2  < ■ ■ ■ < KN,  this  algorithm  searches  for  a 
given  argument  K. 

Bl.  [Initialize.]  Set  l <—  1,  u <—  N. 

B2.  [Get  midpoint.]  (At  this  point  we  know  that  if  K is  in  the  table,  it  satisfies 
Ki  < K < Ku.  A more  precise  statement  of  the  situation  appears  in  exer- 
cise 1 below.)  If  u < l,  the  algorithm  terminates  unsuccessfully.  Otherwise, 
set  i <-  [(l  + u)/ 2j , the  approximate  midpoint  of  the  relevant  table  area. 

B3.  [Compare.]  If  K < Ki:  go  to  B4;  if  K > Ki,  go  to  B5;  and  if  AT  = Ki,  the 
algorithm  terminates  successfully. 

B4.  [Adjust  u .]  Set  u *r-  i — 1 and  return  to  B2. 

B5.  [Adjust  l .]  Set  I «—  i + 1 and  return  to  B2.  | 

Figure  4 illustrates  two  cases  of  this  binary  search  algorithm:  first  to  search 
for  the  argument  653,  which  is  present  in  the  table,  and  then  to  search  for  400, 
which  is  absent.  The  brackets  indicate  l and  u , and  the  underlined  key  repre- 
sents Ki.  In  both  examples  the  search  terminates  after  making  four  comparisons. 
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a)  Searching  for  653: 


[061 

087 

154 

170  275 

426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908] 

061 

087 

154 

170  275 

426 

503 

509 

[512 

612 

653 

677 

703 

765 

897 

908] 

061 

087 

154 

170  275 

426 

503 

509 

[512 

612 

653] 

677 

703 

765 

897 

908 

061 

087 

154 

170  275 

426 

503 

509 

512 

612 

[653] 

677 

703 

765 

897 

908 

b)  Searching 

for  400: 

[061 

087 

154 

170  275 

426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908] 

[061 

087 

154 

170  275 

426 

503] 

509 

512 

612 

653 

677 

703 

765 

897 

908 

061 

087 

154 

170  [275 

426 

503] 

509 

512 

612 

653 

677 

703 

765 

897 

908 

061 

087 

154 

170  [275] 

426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908 

061 

087 

154 

170  275] 

[426 

503 

509 

512 

612 

653 

677 

703 

765 

897 

908 

Fig.  4.  Examples  of  binary  search. 


Program  B ( Binary  search).  As  in  the  programs  of  Section  6.1,  we  assume 
here  that  Ki  is  a full- word  key  appearing  in  location  KEY  + i.  The  following  code 


uses 

rll 

l,  rI2  : 

= u,  rI3  = 

i. 

01 

START 

ENT1 

1 

1 

Bl.  Initialize.  1 <—  1. 

02 

ENT2 

N 

1 

u <—  N. 

03 

JMP 

2F 

1 

To  B2. 

04 

5H 

JE 

SUCCESS 

Cl 

Jump  if  K = Ki. 

05 

ENT1 

1,3 

Cl-S 

B 5.  Adjust  l.  1 <—  i + 1. 

06 

2H 

ENTA 

0,1 

C + l-S 

B2.  Get  midpoint. 

01 

INCA 

0,2 

C + l-S 

tA  «—  l + u. 

08 

SRB 

1 

C + l-S 

rA  «—  [rA/2j.  (rX  changes  too. 

09 

STA 

TEMP 

C + l-S 

10 

CMP1 

TEMP 

C + l-S 

11 

JG 

FAILURE 

C + l-S 

Jump  if  u < l. 

12 

LD3 

TEMP 

c 

i <—  midpoint. 

13 

3H 

LDA 

K 

c 

B3.  Compare. 

14 

CMPA 

KEY,  3 

c 

15 

JGE 

5B 

c 

Jump  if  K > K{. 

16 

ENT2 

-1,3 

C2 

B4.  Adjust  u.  u t—  i — 1. 

17 

JMP 

2B 

C2 

To  B2.  | 

This  procedure  doesn’t  blend  with  MIX  quite  as  smoothly  as  the  other 
algorithms  we  have  seen,  because  MIX  does  not  allow  much  arithmetic  in  index 
registers.  The  running  time  is  (18C  - 105  + 12)u,  where  C = Cl  + C2  is  the 
number  of  comparisons  made  (the  number  of  times  step  B3  is  performed),  and 
5 = [outcome  is  successful].  The  operation  on  line  08  of  this  program  is  “shift 
right  binary  1,”  which  is  legitimate  only  on  binary  versions  of  MIX;  for  general 
byte  size,  this  instruction  should  be  replaced  by  “MUL  =l//2+l=”,  increasing  the 
running  time  to  (26 C - 185  + 20)  u. 

A tree  representation.  In  order  to  really  understand  what  is  happening  in 
Algorithm  B,  our  best  bet  is  to  think  of  the  procedure  as  a binary  decision  tree, 
as  shown  in  Fig.  5 for  the  case  N = 16. 
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Fig.  5.  A comparison  tree  that  corresponds  to  binary  search  when  N = 16. 


When  N is  16,  the  first  comparison  made  by  the  algorithm  is  K : Kg;  this  is 
represented  by  the  root  node  @ in  the  figure.  Then  if  K < Kg,  the  algorithm 
follows  the  left  subtree,  comparing  K to  K^\  similarly  if  K > Kg,  the  right 
subtree  is  used.  An  unsuccessful  search  will  lead  to  one  of  the  external  square 
nodes  numbered  [o]  through  [77] ; for  example,  we  reach  node  IT]  if  and  only  if 
K6  < K < K7. 

The  binary  tree  corresponding  to  a binary  search  on  N records  can  be 
constructed  as  follows:  If  N = 0,  the  tree  is  simply  [¥].  Otherwise  the  root 
node  is 

(Wp, 

the  left  subtree  is  the  corresponding  binary  tree  with  \N/2\  - 1 nodes,  and  the 
right  subtree  is  the  corresponding  binary  tree  with  [N/ 2j  nodes  and  with  all 
node  numbers  increased  by  [jV/2]. 

In  an  analogous  fashion,  any  algorithm  for  searching  an  ordered  table  of 
length  N by  means  of  comparisons  can  be  represented  as  an  IV-node  binary  tree 
in  which  the  nodes  are  labeled  with  the  numbers  1 to  IV  (unless  the  algorithm 
makes  redundant  comparisons).  Conversely,  any  binary  tree  corresponds  to  a 
valid  method  for  searching  an  ordered  table;  we  simply  label  the  nodes 

0 0 0 © 0 B ® ® (i) 

in  symmetric  order,  from  left  to  right. 

If  the  search  argument  input  to  Algorithm  B is  Kw,  the  algorithm  makes  the 
comparisons  K > Kg,  K < K\ 2,  K = Kiq.  This  corresponds  to  the  path  from 
the  root  to  (to)  in  Fig.  5.  Similarly,  the  behavior  of  Algorithm  B on  other  keys 
corresponds  to  the  other  paths  leading  from  the  root  of  the  tree.  The  method  of 
constructing  the  binary  trees  corresponding  to  Algorithm  B therefore  makes  it 
easy  to  prove  the  following  result  by  induction  on  N: 

Theorem  B.  If  2k~1  < N < 2k , a successful  search  using  Algorithm  B requires 
(min  1,  max  k)  comparisons.  If  N = 2fc  — 1,  an  unsuccessful  search  requires 
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k comparisons;  and  if  2k  1 < N < 2k  — 1,  an  unsuccessful  search  requires  either 
k — 1 or  k comparisons.  | 


Further  analysis  of  binary  search.  (Nonmathematical  readers  should  skip 
to  Eq.  (4).)  The  tree  representation  shows  us  also  how  to  compute  the  average 
number  of  comparisons  in  a simple  way.  Let  Cn  be  the  average  number  of 
comparisons  in  a successful  search,  assuming  that  each  of  the  N keys  is  an 
equally  likely  argument;  and  let  C'N  be  the  average  number  of  comparisons  in 
an  unsuccessful  search,  assuming  that  each  of  the  N + 1 intervals  between  and 
outside  the  extreme  values  of  the  keys  is  equally  likely.  Then  we  have 


Cn  — 1 + 


internal  path  length  of  tree 
N 


C'N  = 


external  path  length  of  tree 
N + 1 


by  the  definition  of  internal  and  external  path  length.  We  saw  in  Eq.  2.3.4.5-(3) 
that  the  external  path  length  is  always  2 N more  than  the  internal  path  length. 
Hence  there  is  a rather  unexpected  relationship  between  CN  and  C'N: 


Cn  = 


(2) 


This  formula,  which  is  due  to  T.  N.  Hibbard  [JACM  9 (1962),  16-17],  holds 
for  all  search  methods  that  correspond  to  binary  trees;  in  other  words,  it  holds 
for  all  methods  that  are  based  on  nonredundant  comparisons.  The  variance  of 
successful-search  comparisons  can  also  be  expressed  in  terms  of  the  corresponding 
variance  for  unsuccessful  searches  (see  exercise  25). 

From  the  formulas  above  we  can  see  that  the  “best”  way  to  search  by 
comparisons  is  one  whose  tree  has  minimum  external  path  length,  over  all  binary 
trees  with  N internal  nodes.  Fortunately  it  can  be  proved  that  Algorithm  B is 
optimum  in  this  sense,  for  all  N;  for  we  have  seen  (exercise  5.3.1-20)  that  a 
binary  tree  has  minimum  path  length  if  and  only  if  its  external  nodes  all  occur 
on  at  most  two  adjacent  levels.  It  follows  that  the  external  path  length  of  the 
tree  corresponding  to  Algorithm  B is 

(IV  + l)([lgATj  +2)  -2LlgivJ+1.  (3) 

(See  Eq.  5.3.1-(34).)  From  this  formula  and  (2)  we  can  compute  the  exact 
average  number  of  comparisons,  assuming  that  all  search  arguments  are  equally 
probable. 


N = 1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

CN  = 1 

H 

1? 

2 

H 

2^ 

2- 

2- 

Z8 

2I 

Z9 

2 — 
Z10 

3 

3_L 

■*12 

3A 
^ 13 

3— 

* 14 

3— 

^ 15 

3— 

°16 

cN  = 1 

!§ 

2 

2§ 

2I 

2- 

3 

32 

3 — 
6 10 

3 — 

3_1 

<i12 

qlO 

q 12 
6 14 

q 14 
J15 

4 

4-2- 

*17 

In  general,  if  k — |_lg  N J , we  have 

CN  = k + 1 - (2fe+1  - k - 2 )/N  = lg  N - 1 + e + (k  + 2 )/N, 
C'N  = k + 2 - 2k+1/{N  + 1)  = lg(N  + 1) '+  e' 


where  0 < e,e'  < 0.0861;  see  Eq.  5.3.1-(35). 
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To  summarize:  Algorithm  B never  makes  more  than  [lg  N\  +1  comparisons, 
and  it  makes  about  lg  N — 1 comparisons  in  an  average  successful  search.  No 
search  method  based  on  comparisons  can  do  better  than  this.  The  average 
running  time  of  Program  B is  approximately 

(18  IgA  — 16)  w for  a successful  search, 

(18  lg  N + 12)  u for  an  unsuccessful  search,  ^ 

if  we  assume  that  all  outcomes  of  the  search  are  equally  likely. 

An  important  variation.  Instead  of  using  three  pointers  l,  i,  and  u in  the 
search,  it  is  tempting  to  use  only  two,  namely  the  current  position  i and  its  rate 
of  change,  5;  after  each  unequal  comparison,  we  could  then  set  i <-  i ± <5  and 
& t <5/2  (approximately).  It  is  possible  to  do  this,  but  only  if  extreme  care 
is  paid  to  the  details,  as  in  the  following  algorithm.  Simpler  approaches  are 
doomed  to  failure! 

Algorithm  U ( Uniform  binary  search).  Given  a table  of  records  Ri,R2,...,Rn 
whose  keys  are  in  increasing  order  Ki  < K2  < ■ ■ • < K n,  this  algorithm  searches 
for  a given  argument  K.  If  N is  even,  the  algorithm  will  sometimes  refer  to  a 
dummy  key  Kq  that  should  be  set  to  — oo  (or  any  value  less  than  K).  We  assume 
that  N > 1. 

Ul.  [Initialize.]  Set  * \N/2~\,  m •<-  \_N/2\. 

U2.  [Compare.]  If  K < Ki,  go  to  U3;  if  K > Ki,  go  to  U4;  and  if  K = K{,  the 
algorithm  terminates  successfully. 

U3.  [Decrease  i]  (We  have  pinpointed  the  search  to  an  interval  that  contains 
either  m or  m-1  records;  i points  just  to  the  right  of  this  interval.)  If  m = 0, 
the  algorithm  terminates  unsuccessfully.  Otherwise  set  * i - [m/2] ; then 
set  m [m/2j  and  return  to  U2. 

U4.  [Increase  i.\  (We  have  pinpointed  the  search  to  an  interval  that  contains 
either  morm-1  records;  i points  just  to  the  left  of  this  interval.)  If  m = 0, 
the  algorithm  terminates  unsuccessfully.  Otherwise  set  * -f-  * + [m/2];  then 
set  m <—  [m/2j  and  return  to  U2.  | 

Figure  6 shows  the  corresponding  binary  tree  for  the  search,  when  N = 10. 
In  an  unsuccessful  search,  the  algorithm  may  make  a redundant  comparison  just 
before  termination;  those  nodes  are  shaded  in  the  figure.  We  may  call  the  search 
process  uniform  because  the  difference  between  the  number  of  a node  on  level  l 
and  the  number  of  its  ancestor  on  level  l — l has  a constant  value  5 for  all  nodes 
on  level  l. 

The  theory  underlying  Algorithm  U can  be  understood  as  follows:  Suppose 
that  we  have  an  interval  of  length  n — 1 to  search;  a comparison  with  the  middle 
element  (for  n even)  or  with  one  of  the  two  middle  elements  (for  n odd)  leaves  us 
with  two  intervals  of  lengths  [n/2\  - 1 and  [n/2]  - 1.  After  repeating  this  process 
k times,  we  obtain  2k  intervals,  of  which  the  smallest  has  length  [n/2fcJ  - 1 and 
the  largest  has  length  \n/ 2k]  - 1.  Hence  the  lengths  of  two  intervals  at  the  same 
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Fig.  6.  The  comparison  tree  for  a “uniform”  binary  search,  when  N = 10. 


level  differ  by  at  most  unity;  this  makes  it  possible  to  choose  an  appropriate 
“middle”  element,  without  keeping  track  of  the  exact  lengths. 

The  principal  advantage  of  Algorithm  U is  that  we  need  not  maintain  the 
value  of  m at  all;  we  need  only  refer  to  a short  table  of  the  various  (5  to  use  at 
each  level  of  the  tree.  Thus  the  algorithm  reduces  to  the  following  procedure, 
which  is  equally  good  on  binary  or  decimal  computers: 


Algorithm  C ( Uniform  binary  search).  This  algorithm  is  just  like  Algorithm  U, 
but  it  uses  an  auxiliary  table  in  place  of  the  calculations  involving  m.  The  table 
entries  are 


DELTA  [j]  = 


N + y-1 


for  1 < j < [lg  N \ + 2. 


(6) 


Cl.  [Initialize.]  Set  i «—  DELTA  [1] , j <—  2. 

C2.  [Compare.]  If  A < A;,  go  to  C3;  if  A > Ki,  go  to  C4;  and  if  A = A*,  the 
algorithm  terminates  successfully. 

C3.  [Decrease  *.]  If  DELTA  [j]  = 0,  the  algorithm  terminates  unsuccessfully. 

Otherwise,  set  i <—  i — DELTA  [j]  , j <—  j + 1,  and  go  to  C2. 

C4.  [Increase  i.]  If  DELTA  [j]  = 0,  the  algorithm  terminates  unsuccessfully. 
Otherwise,  set  i <—  i + DELTA  [j]  , j f—  j + 1,  and  go  to  C2.  | 

Exercise  8 proves  that  this  algorithm  refers  to  the  artificial  key  A0  = — oo 
only  when  N is  even. 

Program  C ( Uniform  binary  search).  This  program  does  the  same  job  as 
Program  B,  using  Algorithm  C with  rA  = A,  rll  = i,  rI2  = j,  rI3  = DELTA  [j] . 


01 

START 

ENT1 

N+l/2 

1 

Cl.  Initialize,  i «—  (N  + l')/2 1 

02 

ENT2 

2 

1 

3*-  2- 

03 

LDA 

K 

1 

04 

JMP 

2F 

1 

05 

3H 

JE 

SUCCESS 

Cl 

Jump  if  A = Ki. 

06 

J3Z 

FAILURE 

Cl-S 

Jump  if  DELTA  [j]  = 0. 

07 

DEC1 

0,3 

Cl-S-A 

C3.  Decrease  i. 
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08 

5H 

INC2  1 

C - 1 

3 <-  3 + 1. 

09 

2H 

LD3  DELTA, 2 

C 

C2.  Compare. 

10 

CMPA  KEY, 1 

C 

11 

JLE  3B 

C 

Jump  if  AT  < Ki. 

12 

INC1  0,3 

C2 

C4.  Increase  i. 

13 

J3NZ  5B 

C2 

Jump  if  DELTA  [j]  ^ 0. 

U 

FAILURE  EQU  * 

1-5 

Exit  if  not  in  table.  | 

In  a successful  search,  this  algorithm  corresponds  to  a binary  tree  with  the 
same  internal  path  length  as  the  tree  of  Algorithm  B,  so  the  average  number  of 
comparisons  C n is  the  same  as  before.  In  an  unsuccessful  search,  Algorithm  C 
always  makes  exactly  |_lg  N\  + 1 comparisons.  The  total  running  time  of  Pro- 
gram C is  not  quite  symmetrical  between  left  and  right  branches,  since  Cl  is 
weighted  more  heavily  than  C2,  but  exercise  11  shows  that  we  have  K < K, 
roughly  as  often  as  K > K^  hence  Program  C takes  approximately 

(8.5 lg N — 6)u  for  a successful  search, 

(8.5  |_lg N\  + 12)«  for  an  unsuccessful  search.  ^ 


This  is  more  than  twice  as  fast  as  Program  B,  without  using  any  special  prop- 
erties of  binary  computers,  even  though  the  running  times  (5)  for  Program  B 
assume  that  MIX  has  a “shift  right  binary”  instruction. 

Another  modification  of  binary  search,  suggested  in  1971  by  L.  E.  Shar,  will 
be  still  faster  on  some  computers,  because  it  is  uniform  after  the  first  step,  and 
it  requires  no  table.  The  first  step  is  to  compare  K with  Kt,  where  i = 2k 
k = [lgN\.  If  K < Ki,  we  use  a uniform  search  with  the  S's  equal  to  2k~\ 
2k~2,  . . . , 1,  0.  On  the  other  hand,  if  K > Ki  we  reset  i to  i'  = N + 1 — 2l, 
where  l = |"lg(IV  — 2k  + 1)] , and  pretend  that  the  first  comparison  was  actually 
K > Ki 1,  using  a uniform  search  with  the  <P s equal  to  2(_1,  2l~2,  . . . , 1,0. 

Shar  s method  is  illustrated  for  TV  = 10  in  Fig.  7.  Like  the  previous 
algorithms,  it  never  makes  more  than  |_lg  7VJ  + 1 comparisons;  hence  it  makes 
at  most  one  more  than  the  minimum  possible  average  number  of  comparisons, 
in  spite  of  the  fact  that  it  occasionally  goes  through  several  redundant  steps  in 
succession  (see  exercise  12). 
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Fig.  8.  The  Fibonacci  tree  of  order  6. 

Still  another  modification  of  binary  search,  which  increases  the  speed  of  all 
the  methods  above  when  N is  extremely  large,  is  discussed  in  exercise  23.  See 
also  exercise  24,  for  a method  that  is  faster  yet. 

*Fibonaccian  search.  In  the  polyphase  merge  we  have  seen  that  the  Fibonacci 
numbers  can  play  a role  analogous  to  the  powers  of  2.  A similar  phenomenon 
occurs  in  searching,  where  Fibonacci  numbers  provide  us  with  an  alternative  to 
binary  search.  The  resulting  method  is  preferable  on  some  computers,  because  it 
involves  only  addition  and  subtraction,  not  division  by  2.  The  procedure  we  are 
about  to  discuss  should  be  distinguished  from  an  important  numerical  procedure 
called  “Fibonacci  search,”  which  is  used  to  locate  the  maximum  of  a unimodal 
function  [see  Fibonacci  Quarterly  4 (1966),  265-269];  the  similarity  of  names 
has  led  to  some  confusion. 

The  Fibonaccian  search  technique  looks  very  mysterious  at  first  glance,  if 
we  simply  take  the  program  and  try  to  explain  what  is  happening;  it  seems  to 
work  by  magic.  But  the  mystery  disappears  as  soon  as  the  corresponding  search 
tree  is  displayed.  Therefore  we  shall  begin  our  study  of  the  method  by  looking 
at  Fibonacci  trees. 

Figure  8 shows  the  Fibonacci  tree  of  order  6.  It  looks  somewhat  more  like 
a real-life  shrub  than  the  other  trees  we  have  been  considering,  perhaps  because 
many  natural  processes  satisfy  a Fibonacci  law.  In  general,  the  Fibonacci  tree  of 
order  k has  F^+i  — 1 internal  (circular)  nodes  and  Ffc+i  external  (square)  nodes, 
and  it  is  constructed  as  follows: 

If  k = 0 or  k = 1,  the. tree  is  simply  [~0~|. 

If  k > 2,  the  root  is  F^;  the  left  subtree  is  the  Fibonacci  tree  of  order  k — 1; 

and  the  right  subtree  is  the  Fibonacci  tree  of  order  k — 2 with  all  numbers 

increased  by  F^. 

Except  for  the  external  nodes,  the  numbers  on  the  two  children  of  each  internal 
node  differ  from  their  parent’s  number  by  the  same  amount,  and  this  amount 


418  SEARCHING 


6.2.1 


is  a Fibonacci  number.  For  example,  5 = 8 — F4  and  11  = 8 + F4  in  Fig.  8. 
When  the  difference  is  Fj,  the  corresponding  Fibonacci  difference  for  the  next 
branch  on  the  left  is  Fj- 1,  while  on  the  right  it  skips  down  to  Fj_2.  For  example, 
3 = 5 — F3  while  10  = 11  — F2. 

If  we  combine  these  observations  with  an  appropriate  mechanism  for  recog- 
nizing the  external  nodes,  we  arrive  at  the  following  method: 

Algorithm  F ( Fibonaccian  search).  Given  a table  of  records  Ri,  J?2,  • • • , Rn 
whose  keys  are  in  increasing  order  Ki  < K2  < ■ ■ ■ < KN,  this  algorithm  searches 
for  a given  argument  K. 

For  convenience  in  description,  we  assume  that  N + 1 is  a perfect  Fibonacci 
number,  Ffc+1.  It  is  not  difficult  to  make  the  method  work  for  arbitrary  N,  if  a 
suitable  initialization  is  provided  (see  exercise  14). 

FI.  [Initialize.]  Set  * <—  Fk,  p <—  Fk-  1,  q «—  Ffc_2.  (Throughout  the  algorithm, 
p and  q will  be  consecutive  Fibonacci  numbers.) 

F2.  [Compare.]  If  K < Kt,  go  to  step  F3;  if  K > Ki,  go  to  F4;  and  if  K = Ki, 
the  algorithm  terminates  successfully. 

F3.  [Decrease  *.]  If  q = 0,  the  algorithm  terminates  unsuccessfully.  Otherwise 
set  i 4-  i — q,  and  set  ( p , q)  «—  ( q , p—q);  then  return  to  F2. 

F4.  [Increase  *.]  If  p = 1,  the  algorithm  terminates  unsuccessfully.  Otherwise 
set  i «—  * -(-  q,  p i—  p — q,  then  q <—  q — p,  and  return  to  F2.  | 

The  following  MIX  implementation  gains  speed  by  making  two  copies  of  the 
inner  loop,  one  in  which  p is  in  rI2  and  q in  rI3,  and  one  in  which  the  registers  are 
reversed;  this  simplifies  step  F3.  In  fact,  the  program  actually  keeps  p - 1 and 
q - 1 in  the  registers,  instead  of  p and  q,  in  order  to  simplify  the  test  “p  — 1?” 
in  step  F4. 

Program  F ( Fibonaccian  search).  We  follow  the  previous  conventions,  with 


rA 

= K,  rll  = i, 

(rI2  or  rI3) 

= P~' 

1,  (rI3  or  rI2)  = q — 1. 

01 

START 

LDA 

K 

1 

FI.  Initialize. 

02 

ENT1 

Fk 

1 

i 4—  Fk. 

03 

ENT2 

Fk- i-l 

1 

V «-  Fk- 1. 

04 

ENT3 

Fk- 2-l 

1 

q <-  Fk- 2. 

05 

JMP 

F2A 

1 

To  step  F2. 

06 

F4A 

INCl 

1,3 

C2-S 

— A 

F4.  Increase  i.  i «—  i + 0. 

07 

DEC2 

1,3 

C2  — S 

— A 

p^p-q. 

08 

DEC3 

1,2 

C2-S 

-A 

q-t-q-p. 

09 

F2A 

CMPA 

KEY,  1 

C 

F2.  Compare. 

10 

JL 

F3A 

C 

To  F3  if  K < Ki. 

11 

JE 

SUCCESS 

C2 

Exit  if  K = Ki. 

12 

J2NZ 

F4A 

<72- 

S 

To  F4  if  p ^ 1. 

13 

JMP 

FAILURE 

A 

Exit  if  not  in  table. 

14 

F3A 

DEC1 

1,3 

Cl 

F3.  Decrease  i.  i <—  i — a. 

15 

DEC  2 

1.3 

Cl 

p^p-q. 

16 

J3NN 

F2B 

Cl 

Swap  registers  if  q > 0. 

17 

JMP 

FAILURE 

1-5- 

-A 

Exit  if  not  in  table. 
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18 

F4B 

INC1 

1,2 

19 

DEC3 

1,2 

20 

DEC  2 

1,3 

21 

F2B 

CMPA 

KEY,  1 

22 

JL 

F3B 

23 

JE 

SUCCESS 

24 

J3NZ 

F4B 

25 

JMP 

FAILURE 

26 

F3B 

DEC1 

1,2 

27 

DEC3 

1,2 

28 

J2NN 

F2A 

29 

JMP 

FAILURE 
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The  running  time  of  this  program  is  analyzed  in  exercise  18.  Figure  8 shows, 
and  the  analysis  proves,  that  a left  branch  is  taken  somewhat  more  often  than  a 
right  branch.  Let  C,  Cl,  and  (C2  — S)  be  the  respective  number  of  times  steps 
F2,  F3,  and  F4  are  performed.  Then  we  have 


C — (ave  <j>k /y/5  + 0(1),  max  k — 1), 

Cl  = (ave  fc/\/5  + 0(1),  max  k — 1),  (8) 

C2  — S — (ave  <\>~xk/\J 5 + 0(1),  max  |_fc/2j). 

Thus  the  left  branch  is  taken  about  <j>  « 1.618  times  as  often  as  the  right  branch 

(a  fact  that  we  might  have  guessed,  since  each  probe  divides  the  remaining 
interval  into  two  parts,  with  the  left  part  about  <f>  times  as  large  as  the  right). 
The  total  average  running  time  of  Program  F therefore  comes  to  approximately 

§((18  + 4 4>)k  + 31  - 26 <p)u  » (7.050 lg N + 1.08)u  (9) 


for  a successful  search,  plus  (9  — 3 <j>)u  fa  4.15 u for  an  unsuccessful  search.  This  is 
faster  than  Program  C,  although  the  worst-case  running  time  (roughly  8.6  lg  N) 
is  slightly  slower. 


Interpolation  search.  Let’s  forget  computers  for  a moment,  and  consider  how 
people  actually  carry  out  a search.  Sometimes  everyday  life  provides  us  with 
clues  that  lead  to  good  algorithms. 

Imagine  yourself  looking  up  a word  in  a dictionary.  You  probably  don’t 
begin  by  looking  first  at  the  middle  page,  then  looking  at  the  1/4  or  3/4  point, 
etc.,  as  in  a binary  search.  It’s  even  less  likely  that  you  use  a Fibonaccian  search! 

If  the  word  you  want  starts  with  the  letter  A,  you  probably  begin  near  the 
front  of  the  dictionary.  In  fact,  many  dictionaries  have  thumb  indexes  that  show 
the  starting  page  or  the  middle  page  for  the  words  beginning  with  a fixed  letter. 
This  thumb-index  technique  can  readily  be  adapted  to  computers,  and  it  will 
speed  up  the  search;  such  algorithms  are  explored  in  Section  6.3. 

Yet  even  after  the  initial  point  of  search  has  been  found,  your  actions  still 
are  not  much  like  the  methods  we  have  discussed.  If  you  notice  that  the  desired 
word  is  alphabetically  much  greater  than  the  words  on  the  page  being  examined, 
you  will  turn  over  a fairly  large  chunk  of  pages  before  making  the  next  reference. 
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This  is  quite  different  from  the  algorithms  above,  which  make  no  distinction 
between  “much  greater”  and  “slightly  greater.” 

Such  considerations  suggest  an  algorithm  that  might  be  called  interpolation 
search:  When  we  know  that  K lies  between  Kt  and  Ku,  we  can  choose  the  next 
probe  to  be  about  ( K - Ki)/(KU  — Kt)  of  the  way  between  l and  u,  assuming 
that  the  keys  are  numeric  and  that  they  increase  in  a roughly  constant  manner 
throughout  the  interval. 

Interpolation  search  is  asymptotically  superior  to  binary  search.  One  step  of 
binary  search  essentially  reduces  the  amount  of  uncertainty  from  n to  | n,  while 
one  step  of  interpolation  search  essentially  reduces  it  to  i/n,  when  the  keys  in  the 
table  are  randomly  distributed.  Hence  interpolation  search  takes  about  lg  lg  N 
steps,  on  the  average,  to  reduce  the  uncertainty  from  N to  2.  (See  exercise  22.) 

However,  computer  simulation  experiments  show  that  interpolation  search 
does  not  decrease  the  number  of  comparisons  enough  to  compensate  for  the 
extra  computing  time  involved,  unless  the  table  is  rather  large.  Typical  files 
aren’t  sufficiently  random,  and  the  difference  between  lg  lg  N and  lg  N is  not 
substantial  unless  N exceeds,  say,  216  = 65,536.  Interpolation  is  most  successful 
in  the  early  stages  of  searching  a large  possibly  external  file;  after  the  range  has 
been  narrowed  down,  binary  search  finishes  things  off  more  quickly.  (Note  that 
dictionary  lookup  by  hand  is  essentially  an  external,  not  an  internal,  search.  We 
shall  discuss  external  searching  later.) 


History  and  bibliography.  The  earliest  known  example  of  a long  list  of  items 
that  was  sorted  into  order  to  facilitate  searching  is  the  remarkable  Babylonian 
reciprocal  table  of  Inakibit-Anu,  dating  from  about  200  B.C.  This  clay  tablet 
contains  more  than  100  pairs  of  values,  which  appear  to  be  the  beginning  of 
a list  of  approximately  500  multiple-precision  sexagesimal  numbers  and  their 
reciprocals,  sorted  into  lexicographic  order.  For  example,  the  list  included  the 
following  sequence  of  entries: 


01  13  09  34  29  08  08  53  20 
01  13  14  31  52  30 
01  13  43  40  48 
01  13  48  40  30 
01  14  04  26  40 


49  12  27 
49  09  07  12 
48  49  41  15 

48  46  22  59  25  25  55  33  20 
48  36 


The  task  of  sorting  500  entries  like  this,  given  the  technology  available  at  that 
time,  must  have  been  phenomenal.  [See  D.  E.  Knuth,  Selected  Papers  on  Com- 
puter Science  (Cambridge  Univ.  Press,  1996),  Chapter  11,  for  further  details.] 

It  is  fairly  natural  to  sort  numerical  values  into  order,  but  an  order  relation 
between  letters  or  words  does  not  suggest  itself  so  readily.  Yet  a collating 
sequence  for  individual  letters  was  present  already  in  the  most  ancient  alpha- 
bets. For  example,  many  of  the  Biblical  psalms  have  verses  that  follow  a strict 
alphabetic  sequence,  the  first  verse  starting  with  aleph,  the  second  with  beth, 
etc.;  this  was  an  aid  to  memory.  Eventually  the  standard  sequence  of  letters 
was  used  by  Semitic  and  Greek  peoples  to  denote  numerals;  for  example,  a,  (3,  7 
stood  for  1,  2,  3,  respectively. 
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The  use  of  alphabetic  order  for  entire  words  seems  to  be  a much  later 
invention;  it  is  something  we  might  think  is  obvious,  yet  it  has  to  be  taught 
to  children,  and  at  some  point  in  history  it  was  necessary  to  teach  it  to  adults. 
Several  lists  from  about  300  B.C.  have  been  found  on  the  Aegean  Islands,  giving 
the  names  of  people  in  certain  religious  cults;  these  lists  have  been  alphabetized, 
but  only  by  the  first  letter,  thus  representing  only  the  first  pass  of  a left- 
to-right  radix  sort.  Some  Greek  papyri  from  the  years  A.D.  134-135  contain 
fragments  of  ledgers  that  show  the  names  of  taxpayers  alphabetized  by  the  first 
two  letters.  Apollonius  Sophista  used  alphabetic  order  on  the  first  two  letters, 
and  often  on  subsequent  letters,  in  his  lengthy  concordance  of  Homer’s  poetry 
(first  century  A.D.).  A few  examples  of  more  perfect  alphabetization  are  known, 
notably  Galen’s  Hippocratic  Glosses  (c.  200),  but  they  are  very  rare.  Words  were 
arranged  by  their  first  letter  only  in  the  Etymologiarum  of  St.  Isidorus  (c.  630, 
Book  x);  and  the  Corpus  Glossary  (c.  725)  used  only  the  first  two  letters  of  each 
word.  The  latter  two  works  were  perhaps  the  largest  nonnumerical  files  of  data 
to  be  compiled  during  the  Middle  Ages. 

It  is  not  until  Giovanni  di  Genoa’s  Catholicon  (1286)  that  we  find  a specific 
description  of  true  alphabetical  order.  In  his  preface,  Giovanni  explained  that 


amo 

precedes 

bibo 

abeo 

precedes 

adeo 

amatus 

precedes 

amor 

imprudens 

precedes 

impudens 

iusticia 

precedes 

iustus 

polisintheton 

precedes 

polissenus 

(thereby  giving  examples  of  situations  in  which  the  ordering  is  determined  by  the 
1st,  2nd,  . . . , 6th  letters),  “and  so  in  like  manner.”  He  remarked  that  strenuous 
effort  was  required  to  devise  these  rules.  “I  beg  of  you,  therefore,  good  reader, 
do  not  scorn  this  great  labor  of  mine  and  this  order  as  something  worthless.” 

A detailed  study  of  the  development  of  alphabetic  order,  up  to  the  time 
printing  was  invented,  has  been  made  by  Lloyd  W.  Daly  [ Collection  Latomus 
90  (1967),  100  pages].  He  found  some  interesting  old  manuscripts  that  were 
evidently  used  as  worksheets  while  sorting  words  by  their  first  letters  (see  pages 
89-90  of  his  monograph). 

The  first  dictionary  of  English,  Robert  Cawdrey’s  Table  Alphabeticall  (Lon- 
don, 1604),  contains  the  following  instructions: 

Nowe  if  the  word,  which  thou  art  desirous  to  finde,  beginne  with  (a)  then 
looke  in  the  beginning  of  this  Table,  but  if  with  (v)  looke  towards  the  end. 
Againe,  if  thy  word  beginne  with  (ca)  looke  in  the  beginning  of  the  letter 
(c)  but  if  with  (cu)  then  looke  toward  the  end  of  that  letter.  And  so  of  all 
the  rest.  &c. 

Cawdrey  seems  to  have  been  teaching  himself  how  to  alphabetize  as  he  prepared 
his  dictionary;  numerous  misplaced  words  appear  on  the  first  few  pages,  but  the 
alphabetic  order  in  the  last  part  is  not  as  bad. 
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Binary  search  was  first  mentioned  by  John  Mauchly,  in  what  was  perhaps  the 
first  published  discussion  of  nonnumerical  programming  methods  [Theory  and 
Techniques  for  the  Design  of  Electronic  Digital  Computers,  edited  by  G.  W.  Pat- 
terson, 1 (1946),  9. 7-9. 8;  3 (1946),  22.8-22.9],  The  method  became  well  known 
to  programmers,  but  nobody  seems  to  have  worked  out  the  details  of  what  should 
be  done  when  N does  not  have  the  special  form  2"  - 1.  [See  A.  D.  Booth,  Nature 
176  (1955),  565;  A.  I.  Dumey,  Computers  and  Automation  5 (December  1956),  7, 
where  binary  search  is  called  “Twenty  Questions”;  Daniel  D.  McCracken,  Digital 
Computer  Programming  (Wiley,  1957),  201-203;  and  M.  Halpern,  CACM  1, 1 
(February  1958),  1-3.] 

D.  H.  Lehmer  [Proc.  Symp.  Appl.  Math.  10  (1960),  180-181]  was  apparently 
the  first  to  publish  a binary  search  algorithm  that  works  for  all  N.  The  next 
step  was  taken  by  H.  Bottenbruch  [JACM  9 (1962),  214],  who  presented  an 
interesting  variation  of  Algorithm  B that  avoids  a separate  test  for  equality  until 
the  very  end:  Using 

it—  |"(f  + u)/ 2] 

instead  of  i ■<—  [(/  + u) / 2 J in  step  B2,  he  set  l 4—  i whenever  K > Kp  then 
u - l decreases  at  every  step.  Eventually,  when  l = u,  we  have  Kt  < K < Ki+ j, 
and  we  can  test  whether  or  not  the  search  was  successful  by  making  one  more 
comparison.  (He  assumed  that  K > Kx  initially.)  This  idea  speeds  up  the  inner 
loop  slightly  on  many  computers,  and  the  same  principle  can  be  used  with  all 
of  the  algorithms  we  have  discussed  in  this  section;  but  a successful  search  will 
require  about  one  more  iteration,  on  the  average,  because  of  (2).  Since  the  inner 
loop  is  performed  only  about  lg  N times,  this  tradeoff  between  an  extra  iteration 
and  a faster  loop  does  not  save  time  unless  n is  extremely  large.  (See  exercise  23.) 
On  the  other  hand  Bottenbruch’s  algorithm  will  find  the  rightmost  occurrence  of 
a given  key  when  the  table  contains  duplicates,  and  this  property  is  occasionally 
important. 

K.  E.  Iverson  [A  Programming  Language  (Wiley,  1962),  141]  gave  the  proce- 
dure of  Algorithm  B,  but  without  considering  the  possibility  of  an  unsuccessful 
search.  D.  E.  Knuth  [ CACM  6 (1963),  556-558]  presented  Algorithm  B as 
an  example  used  with  an  automated  flowcharting  system.  The  uniform  binary 
search,  Algorithm  C,  was  suggested  to  the  author  by  A.  K.  Chandra  of  Stanford 
University  in  1971. 

Fibonaccian  searching  was  invented  by  David  E.  Ferguson  [CACM  3 (1960), 
648].  Binary  trees  similar  to  Fibonacci  trees  appeared  in  the  pioneering  work 
of  the  Norwegian  mathematician  Axel  Thue  as  early  as  1910  (see  exercise  28). 
A Fibonacci  tree  without  labels  was  also  exhibited  as  a curiosity  in  the  first 
edition  of  Hugo  Steinhaus’s  popular  book  Mathematical  Snapshots  (New  York: 
Stechert,  1938),  page  28;  he  drew  it  upside  down  and  made  it  look  like  a real 
tree,  with  right  branches  twice  as  long  as  left  branches  so  that  all  the  leaves 
would  occur  at  the  same  level. 

Interpolation  searching  was  suggested  by  W.  W.  Peterson  [IBM  J.  Res.  & 
Devel.  1 (1957),  131-132],  A correct  analysis  of  its  average  behavior  was  not 
discovered  until  many  years  later  (see  exercise  22). 
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EXERCISES 

► 1.  [21]  Prove  that  if  u < l in  step  B2  of  the  binary  search,  we  have  u = l — 1 and 
Ku  < K < Kt.  (Assume  by  convention  that  K0  = — oo  and  Kn+ i = +oo,  although 
these  artificial  keys  are  never  really  used  by  the  algorithm  so  they  need  not  be  present 
in  the  actual  table.) 


► 2.  [22]  Would  Algorithm  B still  work  properly  when  K is  present  in  the  table  if  we 
(a)  changed  step  B5  to  “Z  f-  i”  instead  of  “Z  -f-  i + 1”?  (b)  changed  step  B4  to  “u  i” 
instead  of  “u  «—  i — 1”?  (c)  made  both  of  these  changes? 


3.  [15] 


What  searching  method  corresponds  to  the  tree 


What  is  the  average  number  of  comparisons  made  in  a successful  search? 
unsuccessful  search? 


in  an 


4.  [20]  If  a search  using  Program  6. IS  (sequential  search)  takes  exactly  638  units  of 
time,  how  long  does  it  take  with  Program  B (binary  search)? 

5.  [M24]  For  what  values  of  N is  Program  B actually  slower  than  a sequential  search 
(Program  6.1Q')  on  the  average,  assuming  that  the  search  is  successful? 

6.  [28]  (K.  E.  Iverson.)  Exercise  5 suggests  that  it  would  be  best  to  have  a hybrid 
method,  changing  from  binary  search  to  sequential  search  when  the  remaining  interval 
has  length  less  than  some  judiciously  chosen  value.  Write  an  efficient  MIX  program  for 
such  a search  and  determine  the  best  changeover  value. 

► 7.  [ M22 ] Would  Algorithm  U still  work  properly  if  we  changed  step  U1  so  that 

a)  both  i and  m are  set  equal  to  |_iV /2J  ? 

b)  both  i and  m are  set  equal  to  \N/ 2]? 

[Hint:  Suppose  the  first  step  were  “Set  i <-  0,  m 4-  N (or  N + 1),  go  to  U4.”] 

8.  [M20]  Let  5j  = DELTA  [j]  be  the  jth  increment  in  Algorithm  C,  as  defined  in  (6). 

a)  What  is  the  sum  5,? 

b)  What  are  the  minimum  and  maximum  values  of  i that  can  occur  in  step  C2? 

9.  [20]  Is  there  any  value  of  N > 1 for  which  Algorithm  B and  C are  exactly 

equivalent,  in  the  sense  that  they  will  both  perform  the  same  sequence  of  comparisons 
for  all  search  arguments? 

10.  [21]  Explain  how  to  write  a MIX  program  for  Algorithm  C containing  approx- 
imately 7 lg  N instructions  and  having  a running  time  of  about  4.5  lg  N units. 

11.  [M26]  Find  exact  formulas  for  the  average  values  of  Cl,  C2,  and  A in  the  fre- 
quency analysis  of  Program  C,  as  a function  of  N and  S. 

12.  [20]  Draw  the  binary  search  tree  corresponding  to  Shar’s  method  when  N = 12. 

13.  [M24  ] Tabulate  the  average  number  of  comparisons  made  by  Shar’s  method,  for 
1 < N < 16,  considering  both  successful  and  unsuccessful  searches. 

14.  [21]  Explain  how  to  extend  Algorithm  F so  that  it  will  apply  for  all  N > 1. 

15.  [M19]  For  what  values  of  k does  the  Fibonacci  tree  of  order  k define  an  optimal 
search  procedure,  in  the  sense  that  the  fewest  comparisons  are  made  on  the  average? 
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16.  [21]  Figure  9 shows  the  lineal  chart  of  the  rabbits  in  Fibonacci’s  original  rabbit 
problem  (see  Section  1.2.8).  Is  there  a simple  relationship  between  this  and  the 
Fibonacci  tree  discussed  in  the  text? 

Initial  pair 
First  month 
Second  month 
Third  month 
Fourth  month 
Fifth  month 
Sixth  month 

Fig.  9.  Pairs  of  rabbits  breeding  by  Fibonacci’s  rule. 

17.  [ M21 ] From  exercise  1.2.8-34  (or  exercise  5.4.2-10)  we  know  that  every  positive 
integer  n has  a unique  representation  as  a sum  of  Fibonacci  numbers 

n — Fai  + Fa2  + • • • + Far , 

where  r > 1,  a3  > a,j+ 1 + 2 for  1 < j < r,  and  aT  > 2.  Prove  that  in  the  Fibonacci  tree 
of  order  k,  the  path  from  the  root  to  node  (n)  has  length  k + 1 - r - ar. 

18.  [M30]  Find  exact  formulas  for  the  average  values  of  Cl,  C 2,  and  A in  the  fre- 
quency analysis  of  Program  F,  as  a function  of  k,  Fk,  Fk+ 1,  and  S. 

19.  [ M42 ] Carry  out  a detailed  analysis  of  the  average  running  time  of  the  algorithm 
suggested  in  exercise  14. 

20.  [M22]  The  number  of  comparisons  required  in  a binary  search  is  approximately 
log2  N,  and  in  the  Fibonaccian  search  it  is  roughly  (<f>/\/ 5 ) log0  N.  The  purpose  of  this 
exercise  is  to  show  that  these  formulas  are  special  cases  of  a more  general  result. 

Let  p and  q be  positive  numbers  with  p + q = 1.  Consider  a search  algorithm  that, 
given  a table  of  N numbers  in  increasing  order,  starts  by  comparing  the  argument  with 
the  (plV)th  key,  and  iterates  this  procedure  on  the  smaller  blocks.  (The  binary  search 
has  p = q = 1/2;  the  Fibonaccian  search  has  p = 1 / </>,  q = l//>2.) 

If  C(N)  denotes  the  average  number  of  comparisons  required  to  search  a table  of 
size  N,  it  approximately  satisfies  the  relations 

C(l)  = 0;  C(N)  = 1 + pC(pN)  + qC(qN)  for  N > 1. 

This  happens  because  there  is  probability  p (roughly)  that  the  search  reduces  to  a 
piV-element  search,  and  probability  q that  it  reduces  to  a qiV-element  search,  after  the 
first  comparison.  When  N is  large,  we  may  ignore  the  small-order  effect  caused  by  the 
fact  that  pN  and  qN  aren’t  exactly  integers. 

a)  Show  that  C(N)  = logb  N satisfies  these  relations  exactly,  for  a certain  choice  of  b. 
For  binary  and  Fibonaccian  search,  this  value  of  b agrees  with  the  formulas  derived 
earlier. 

b)  Consider  the  following  argument:  “With  probability  p,  the  size  of  the  interval 
being  scanned  in  this  algorithm  is  divided  by  1/p;  with  probability  q,  the  interval 
size  is  divided  by  1/q.  Therefore  the  interval  is  divided  by  p • (1/p)  + q ■ (1  /q)  = 2 
on  the  average,  so  the  algorithm  is  exactly  as  good  as  the  binary  search,  regardless 
of  p and  q.”  Is  there  anything  wrong  with  this  reasoning? 
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21.  [20]  Draw  the  binary  tree  corresponding  to  interpolation  search  when  N = 10. 

22.  [M4 1 ] (A.  C.  Yao  and  F.  F.  Yao.)  Show  that  an  appropriate  formulation  of 
interpolation  search  requires  asymptotically  lg  lg  N comparisons,  on  the  average,  when 
applied  to  N independent  uniform  random  keys  that  have  been  sorted.  Furthermore 
all  search  algorithms  on  such  tables  must  make  asymptotically  lg  lg  N comparisons,  on 
the  average. 

► 23.  [25]  The  binary  search  algorithm  of  H.  Bottenbruch,  mentioned  at  the  close  of 
this  section,  avoids  testing  for  equality  until  the  very  end  of  the  search.  (During  the 
algorithm  we  know  that  Ki  < K < Ku+ i,  and  the  case  of  equality  is  not  examined 
until  l = u.)  Such  a trick  would  make  Program  B run  a little  bit  faster  for  large  N, 
since  the  “JE”  instruction  could  be  removed  from  the  inner  loop.  (However,  the  idea 
wouldn’t  really  be  practical  since  lg  N is  always  rather  small;  we  would  need  N > 266 
in  order  to  compensate  for  the  extra  work  necessary  on  a successful  search,  because  the 
running  time  (18\gN  - I6)u  of  (5)  is  “decreased”  to  (17.5  lgiV  + 17)it!) 

Show  that  every  search  algorithm  corresponding  to  a binary  tree  can  be  adapted  to 
a search  algorithm  that  uses  two-way  branching  ( < versus  > ) at  the  internal  nodes  of 
the  tree,  in  place  of  the  three-way  branching  ( < , = , or  > ) used  in  the  text’s  discussion. 
In  particular,  show  how  to  modify  Algorithm  C in  this  way. 

► 24.  [23]  We  have  seen  in  Sections  2. 3. 4. 5 and  5.2.3  that  the  complete  binary  tree  is 
a convenient  way  to  represent  a minimum-path-length  tree  in  consecutive  locations. 
Devise  an  efficient  search  method  based  on  this  representation.  [Hint:  Is  it  possible  to 
use  multiplication  by  2 instead  of  division  by  2 in  a binary  search?] 

► 25.  [M25]  Suppose  that  a binary  tree  has  a*,  internal  nodes  and  bk  external  nodes 
on  level  k,  for  k = 0,  1,  . . . . (The  root  is  at  level  zero.)  Thus  in  Fig.  8 we  have 
(fflo,  ffli,  • • • > 05)  = (1,2, 4, 4, 1,0)  and  (60,61,.  ..,65)  = (0,0, 0,4, 7, 2). 

a)  Show  that  a simple  algebraic  relationship  holds  between  the  generating  functions 
A(z)  = Hk  akzk  and  B(Z)  = Hk  &***• 

b)  The  probability  distribution  for  a successful  search  in  a binary  tree  has  the  gen- 
erating function  g(z)  = zA(z) /N,  and  for  an  unsuccessful  search  the  generating 
function  is  h(z)  = B(z)/(N  + 1).  (Thus  in  the  text’s  notation  we  have  Cn  = 
mean  (g),  C'N  = mean  (A;,),  and  Eq.  (2)  gives  a relation  between  these  quantities.) 
Find  a relation  between  var (g)  and  var (h). 

26.  [22]  Show  that  Fibonacci  trees  are  related  to  polyphase  merge  sorting  on  three 
tapes. 

27.  [M30]  (H.  S.  Stone  and  John  Linn.)  Consider  a search  process  that  uses  k 
processors  simultaneously  and  that  is  based  solely  on  comparisons  of  keys.  Thus  at 
every  step  of  the  search,  k indices  ii, . . . ,ik  are  specified,  and  we  perform  k simultaneous 
comparisons;  if  K = K,j  for  some  j,  the  search  terminates  successfully,  otherwise 
the  search  proceeds  to  the  next  step  based  on  the  2k  possible  outcomes  K < Ki  or 
K > Kij , for  1 < j < k. 

Prove  that  such  a process  must  always  take  at  least  approximately  logfc+1  N steps 
on  the  average,  as  N — > 00,  assuming  that  each  key  of  the  table  is  equally  likely  as  a 
search  argument.  (Hence  the  potential  increase  in  speed  over  1-processor  binary  search 
is  only  a factor  of  lg(fc  + 1),  not  the  factor  of  k we  might  expect.  In  this  sense  it  is  more 
efficient  to  assign  each  processor  to  a different,  independent  search  problem,  instead  of 
making  them  cooperate  on  a single  search.) 
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28.  [ M23 ] Define  Thue  trees  Tn  by  means  of  algebraic  expressions  in  a binary  opera- 
tor * as  follows:  T0(x)  = x * x,  Tx(x)  = x,  Tn+2(x)  = T„+1(x)  * Tn(x). 

a)  The  number  of  leaves  of  Tn  is  the  number  of  occurrences  of  x when  T„  (a:)  is  written 
out  in  full.  Express  this  number  in  terms  of  Fibonacci  numbers. 

b)  Prove  that  if  the  binary  operator  * satisfies  the  axiom 

((x  * x)  * x)  * ((x  * x)  * x)  = X , 
then  I'm  (Tn  (x)  ) = Tm+n_i(x)  for  all  m > 0 and  n > 1. 

► 29.  [22]  (Paul  Feldman,  1985.)  Instead  of  assuming  that  K\  < K2  < ■ ■ ■ < Kn, 
assume  only  that  Kp(X)  < Kp( 2)  < • • • < KP^N)  where  the  permutation  p(l)p(2) . . .p(N) 
is  an  involution,  and  p(j)  = j for  all  even  values  of  j.  Show  that  we  can  locate  any  given 
key  K,  or  determine  that  K is  not  present,  by  making  at  most  2[lglVJ  +1  comparisons. 

30.  [27]  ( Involution  coding.)  Using  the  idea  of  the  previous  exercise,  find  a way  to 
arrange  N distinct  keys  in  such  a way  that  their  relative  order  implicitly  encodes  an 
arbitrarily  given  array  of  t- bit  numbers  xi,  x2,  ...,  xm,  when  m < N/ 4 +1-2*. 
With  your  arrangement  it  should  be  possible  to  determine  the  leading  k bits  of  Xj  by 
making  only  k comparisons,  for  any  given  j,  as  well  as  to  look  up  an  arbitrary  key  with 
< 2 [lg  N]  + 1 comparisons.  (This  result  is  used  in  theoretical  studies  of  data  structures 
that  are  asymptotically  efficient  in  both  time  and  space.) 

6.2.2.  Binary  Tree  Searching 

In  the  preceding  section,  we  learned  that  an  implicit  binary  tree  structure  makes 
the  behavior  of  binary  search  and  Fibonaccian  search  easier  to  understand.  For  a 
given  value  of  N,  the  tree  corresponding  to  binary  search  achieves  the  theoretical 
minimum  number  of  comparisons  that  are  necessary  to  search  a table  by  means 
of  key  comparisons.  But  the  methods  of  the  preceding  section  are  appropriate 
mainly  for  fixed-size  tables,  since  the  sequential  allocation  of  records  makes 
insertions  and  deletions  rather  expensive.  If  the  table  is  changing  dynamically, 
we  might  spend  more  time  maintaining  it  than  we  save  in  binary-searching  it. 

The  use  of  an  explicit  binary  tree  structure  makes  it  possible  to  insert  and 
delete  records  quickly,  as  well  as  to  search  the  table  efficiently.  As  a result,  we 
essentially  have  a method  that  is  useful  both  for  searching  and  for  sorting.  This 
gain  in  flexibility  is  achieved  by  adding  two  link  fields  to  each  record  of  the  table. 

Techniques  for  searching  a growing  table  are  often  called  symbol  table  algo- 
rithms, because  assemblers  and  compilers  and  other  system  routines  generally 
use  such  methods  to  keep  track  of  user-defined  symbols.  For  example,  the  key  of 
each  record  within  a compiler  might  be  a symbolic  identifier  denoting  a variable 
in  some  FORTRAN  or  C program,  and  the  rest  of  the  record  might  contain 
information  about  the  type  of  that  variable  and  its  storage  allocation.  Or  the  key 
might  be  a symbol  in  a MIXAL  program,  with  the  rest  of  the  record  containing  the 
equivalent  of  that  symbol.  The  tree  search  and  insertion  routines  to  be  described 
in  this  section  are  quite  efficient  for  use  as  symbol  table  algorithms,  especially  in 
applications  where  it  is  desirable  to  print  out  a list  of  the  symbols  in  alphabetic 
order.  Other  symbol  table  algorithms  are  described  in  Sections  6.3  and  6.4. 

Figure  10  shows  a binary  search  tree  containing  the  names  of  eleven  signs  of 
the  zodiac.  If  we  now  search  for  the  twelfth  name,  SAGITTARIUS,  starting  at  the 
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Fig.  10.  A binary  search  tree. 


root  or  apex  of  the  tree,  we  find  it  is  greater  than  CAPRICORN,  so  we  move  to  the 
right;  it  is  greater  than  PISCES,  so  we  move  right  again;  it  is  less  than  TAURUS,  so 
we  move  left;  and  it  is  less  than  SCORPIO,  so  we  arrive  at  external  node  [~8~| . The 
search  was  unsuccessful;  we  can  now  insert  SAGITTARIUS  at  the  place  the  search 
ended,  by  linking  it  into  the  tree  in  place  of  the  external  node  [s] . In  this  way 
the  table  can  grow  without  the  necessity  of  moving  any  of  the  existing  records. 
Figure  10  was  formed  by  starting  with  an  empty  tree  and  successively  inserting 
the  keys  CAPRICORN,  AQUARIUS,  PISCES,  ARIES,  TAURUS,  GEMINI,  CANCER,  LEO, 
VIRGO,  LIBRA,  SCORPIO,  in  this  order. 

All  of  the  keys  in  the  left  subtree  of  the  root  in  Fig.  10  are  alphabetically 
less  than  CAPRICORN,  and  all  keys  in  the  right  subtree  are  alphabetically  greater. 
A similar  statement  holds  for  the  left  and  right  subtrees  of  every  node.  It  follows 
that  the  keys  appear  in  strict  alphabetic  sequence  from  left  to  right, 

AQUARIUS,  ARIES,  CANCER,  CAPRICORN,  GEMINI,  LEO,  ...,  VIRGO 

if  we  traverse  the  tree  in  symmetric  order  (see  Section  2.3.1),  since  symmetric 
order  is  based  on  traversing  the  left  subtree  of  each  node  just  before  that  node, 
then  traversing  the  right  subtree. 

The  following  algorithm  spells  out  the  searching  and  insertion  processes  in 
detail. 

Algorithm  T ( Tree  search  and  insertion).  Given  a table  of  records  that  form  a 
binary  tree  as  described  above,  this  algorithm  searches  for  a given  argument  K. 
If  K is  not  in  the  table,  a new  node  containing  K is  inserted  into  the  tree  in  the 
appropriate  place. 


428  SEARCHING 


6.2.2 


The  nodes  of  the  tree  are  assumed  to  contain  at  least  the  following  fields: 

KEY(P)  = key  stored  in  NODE(P); 

LLINK(P)  = pointer  to  left  subtree  of  NODE(P); 

RLINK(P)  = pointer  to  right  subtree  of  NODE(P). 

Null  subtrees  (the  external  nodes  in  Fig.  10)  are  represented  by  the  null  pointer  A. 

The  variable  ROOT  points  to  the  root  of  the  tree.  For  convenience,  we  assume 

that  the  tree  is  not  empty  (that  is,  ROOT  7^  A),  since  the  necessary  operations 

are  trivial  when  ROOT  = A. 

Tl.  [Initialize.]  Set  P 4—  ROOT.  (The  pointer  variable  P will  move  down  the  tree.) 

T2.  [Compare.]  If  K < KEY (P) , go  to  T3;  if  K > KEY (P) , go  to  T4;  and  if 
K = KEY(P),  the  search  terminates  successfully. 

T3.  [Move  left.]  If  LLINK(P)  / A,  set  P 4-  LLINK(P)  and  go  back  to  T2. 
Otherwise  go  to  T5. 

T4.  [Move  right.]  If  RLINK(P)  / A,  set  P 4-  RLINK(P)  and  go  back  to  T2. 

T5.  [Insert  into  tree.]  (The  search  is  unsuccessful;  we  will  now  put  K into  the 
tree.)  Set  C)  4=  AVAIL,  the  address  of  a new  node.  Set  KEY (Q)  4-  AT, 
LLINK(Q)  4—  RLINK(Q)  4—  A.  (In  practice,  other  fields  of  the  new  node 
should  also  be  initialized.)  If  K was  less  than  KEY(P),  set  LLINK(P)  4—  Q, 
otherwise  set  RLINK(P)  4—  Q.  (At  this  point  we  could  set  P 4—  Q and 
terminate  the  algorithm  successfully.)  | 


Fig.  11.  Tree  search  and  insertion. 


This  algorithm  lends  itself  to  a convenient  machine  language  implementa- 
tion. We  may  assume,  for  example,  that  the  tree  nodes  have  the  form 


+ 

0 

LLINK 

RLINK 

KEY 

followed  perhaps  by  additional  words  of  INFO.  Using  an  AVAIL  list  for  the  free 
storage  pool,  as  in  Chapter  2,  we  can  write  the  following  MIX  program: 
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Program  T ( Tree  search  and  insertion).  rA  = K.  rll  = P,  rI2  = Q. 


01 

LLINK  EQU 

2:3 

02 

RLINK  EQU 

4:5 

03 

START 

LDA 

K 

1 

Tl.  Initialize. 

04 

LD1 

ROOT 

1 

P <-  ROOT. 

05 

JMP 

2F 

1 

06 

4H 

LD2 

0,1 (RLINK) 

C2 

T4.  Move  right.  0 <-  RLINK  (P) 

07 

J2Z 

5F 

C2 

To  T5  if  Q = A. 

08 

1H 

ENT1 

0,2 

C - 1 

P <-  Q. 

09 

2H 

CMPA 

1,1 

C 

T2.  Compare. 

10 

JG 

4B 

C 

To  T4  if  K > KEY  (P) . 

11 

JE 

SUCCESS 

Cl 

Exit  if  K = KEY (P) . 

12 

LD2 

0,1 (LLINK) 

Cl  - S 

T3.  Move  left.  0 <-  LLINK  (P). 

13 

J2NZ 

IB 

Cl-S 

To  T2  if  Q / A. 

H 

5H 

LD2 

AVAIL 

1-5 

T 5.  Insert  into  tree. 

15 

J2Z 

OVERFLOW 

1-5 

16 

LDX 

0,2 (RLINK) 

1-5 

17 

STX 

AVAIL 

1 - 5 

Q <=  AVAIL. 

18 

STA 

1,2 

1 - 5 

KEY(Q)  «-  K. 

19 

STZ 

0,2 

1 - 5 

LLINK (Q)  «-  RLINK(Q)  «-  A. 

20 

JL 

IF 

1-5 

Was  K < KEY  (P)  ? 

21 

ST2 

0,1 (RLINK) 

A 

RLINK (P)  <-  Q. 

22 

JMP 

*+2 

A 

23 

1H 

ST2 

0,1 (LLINK) 

1-S-A 

LLINK (P)  <-  Q. 

24 

DONE 

EQU 

* 

1 - S 

Exit  after  insertion.  | 

The  first  13  lines  of  this  program  do  the  search;  the  last  11  lines  do  the 
insertion.  The  running  time  for  the  searching  phase  is  (7 C + Cl  — 35  + 4)u, 
where 

C = number  of  comparisons  made; 

Cl  = number  of  times  K < KEY  (P) ; 

C2  = number  of  times  K > KEY (P) ; 

5 = [search  is  successful]. 

On  the  average  we  have  Cl  = |(C  + 5),  since  Cl  + C2  = C and  Cl  - 5 has 
the  same  probability  distribution  as  C 2;  so  the  running  time  is  about  (7.5C  — 
2. 55  + 4)u.  This  compares  favorably  with  the  binary  search  algorithms  that  use 
an  implicit  tree  (see  Program  6.2. 1C).  By  duplicating  the  code  as  in  Program 
6.2. IF  we  could  effectively  eliminate  line  08  of  Program  T,  reducing  the  running 
time  to  (6.5C  — 2.55  + 5 )u.  If  the  search  is  unsuccessful,  the  insertion  phase  of 
the  program  costs  an  extra  14u  or  15u. 

Algorithm  T can  conveniently  be  adapted  to  variable-length  keys  and  vari- 
able-length records.  For  example,  if  we  allocate  the  available  space  sequentially, 
in  a last-in-first-out  manner,  we  can  easily  create  nodes  of  varying  size;  the  first 
word  of  (l)  could  indicate  the  size.  Since  this  is  an  efficient  use  of  storage, 
symbol  table  algorithms  based  on  trees  are  often  especially  attractive  for  use  in 
compilers,  assemblers,  and  loaders. 
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But  what  about  the  worst  case?  Programmers  are  often  skeptical  of  Algo- 
rithm T when  they  first  see  it.  If  the  keys  of  Fig.  10  had  been  entered  into 
the  tree  in  alphabetic  order  AQUARIUS,  . . . , VIRGO  instead  of  the  calendar  order 
CAPRICORN,  . . . , SCORPIO,  the  algorithm  would  have  built  a degenerate  tree  that 
essentially  specifies  a sequential  search.  All  LLINKs  would  be  null.  Similarly,  if 
the  keys  come  in  the  uncommon  order 


AQUARIUS,  VIRGO,  ARIES,  TAURUS,  CANCER,  SCORPIO, 

CAPRICORN,  PISCES,  GEMINI,  LIBRA,  LEO 

we  obtain  a “zigzag”  tree  that  is  just  as  bad.  (Try  it!) 

On  the  other  hand,  the  particular  tree  in  Fig.  10  requires  only  3-q-  com- 
parisons, on  the  average,  for  a successful  search;  this  is  just  a little  higher  than 
the  minimum  possible  average  number  of  comparisons,  3,  achievable  in  the  best 
possible  binary  tree. 

When  we  have  a fairly  balanced  tree,  the  search  time  is  roughly  propor- 
tional to  log  N,  but  when  we  have  a degenerate  tree,  the  search  time  is  roughly 
proportional  to  N.  Exercise  2. 3. 4. 5-5  proves  that  the  average  search  time  would 
be  roughly  proportional  to  y/N  if  we  considered  each  W-node  binary  tree  to  be 
equally  likely.  What  behavior  can  we  really  expect  from  Algorithm  T? 

Fortunately,  it  turns  out  that  tree  search  will  require  only  about  2lnN  & 
1.386  IgN  comparisons,  if  the  keys  are  inserted  into  the  tree  in  random  order; 
well-balanced  trees  are  common,  and  degenerate  trees  are  very  rare. 

There  is  a surprisingly  simple  proof  of  this  fact.  Let  us  assume  that  each  of 
the  AT!  possible  orderings  of  the  N keys  is  an  equally  likely  sequence  of  insertions 
for  building  the  tree.  The  number  of  comparisons  needed  to  find  a key  is  exactly 
one  more  than  the  number  of  comparisons  that  were  needed  when  that  key  was 
entered  into  the  tree.  Therefore  if  Cjv  is  the  average  number  of  comparisons 
involved  in  a successful  search  and  C'N  is  the  average  number  in  an  unsuccessful 
search,  we  have 

Cq  + c[-\ b cVj 

• (2) 


c 


N 


1 + 


N 


But  the  relation  between  internal  and  external  path  length  tells  us  that 


this  is  Eq.  6.2.1-(2).  Putting  (3)  together  with  (2)  yields 

(N  + 1 )C'N  = 2N  + C'  + C[  + • • • + C'N_V 
This  recurrence  is  easy  to  solve.  Subtracting  the  equation 

NCN- 1 = 2(N  — 1)  + C'Q  + C[  + • • • + C'N_ 


(3) 


(4) 


2> 


we  obtain 


(N  + 1)C'n-NC'n_1=2  + C'p 


N- 1) 


hence 


C'N  = c'N. 


1+2/(AT+l). 
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Since  Cq  = 0,  this  means  that 

C'N  — 2Hn+i  - 2.  (5) 

Applying  (3)  and  simplifying  yields  the  desired  result 


Exercises  6,  7,  and  8 below  give  more  detailed  information;  it  is  possible  to 
compute  the  exact  probability  distribution  of  Cn  and  C'^ , not  merely  the  average 
values. 

Tree  insertion  sorting.  Algorithm  T was  developed  for  searching,  but  it  can 
also  be  used  as  the  basis  of  an  internal  sorting  algorithm;  in  fact,  we  can  view 
it  as  a natural  generalization  of  list  insertion,  Algorithm  5.2. 1L.  When  properly 
programmed,  its  average  running  time  will  be  only  a little  slower  than  some  of  the 
best  algorithms  we  discussed  in  Chapter  5.  After  the  tree  has  been  constructed 
for  all  keys,  a symmetric  tree  traversal  (Algorithm  2.3. IT)  will  visit  the  records 
in  sorted  order. 

A few  precautions  are  necessary,  however.  Something  different  needs  to  be 
done  if  K = KEY(P)  in  step  T2,  since  we  are  sorting  instead  of  searching.  One 
solution  is  to  treat  K = KEY (P)  exactly  as  if  K > KEY  (P) ; this  leads  to  a stable 
sorting  method.  (Equal  keys  will  not  necessarily  be  adjacent  in  the  tree;  they  will 
only  be  adjacent  in  symmetric  order.)  But  if  many  duplicate  keys  are  present, 
this  method  will  cause  the  tree  to  get  badly  unbalanced,  and  the  sorting  will 
slow  down.  Another  idea  is  to  keep  a list,  for  each  node,  of  all  records  having 
the  same  key;  this  requires  another  link  field,  but  it  will  make  the  sorting  faster 
when  a lot  of  equal  keys  occur. 

Thus  if  we  are  interested  only  in  sorting,  not  in  searching,  Algorithm  T isn’t 
the  best,  but  it  isn’t  bad.  And  if  we  have  an  application  that  combines  searching 
with  sorting,  the  tree  method  can  be  warmly  recommended. 

It  is  interesting  to  note  that  there  is  a strong  relation  between  the  analysis 
of  tree  insertion  sorting  and  the  analysis  of  quicksort,  although  the  methods 
are  superficially  dissimilar.  If  we  successively  insert  N keys  into  an  initially 
empty  tree,  we  make  the  same  average  number  of  comparisons  between  keys  as 
Algorithm  5.2.2Q  does,  with  minor  exceptions.  For  example,  in  tree  insertion 
every  key  gets  compared  with  Ki,  and  then  every  key  less  than  Kx  gets  compared 
with  the  first  key  less  than  Ki,  etc.;  in  quicksort,  every  key  gets  compared  to 
the  first  partitioning  element  K and  then  every  key  less  than  K gets  compared 
to  a particular  element  less  than  K,  etc.  The  average  number  of  comparisons 
needed  in  both  cases  is  NCn  - N.  (However,  Algorithm  5.2.2Q  actually  makes 
a few  more  comparisons,  in  order  to  speed  up  the  inner  loops.) 

Deletions.  Sometimes  we  want  to  make  the  computer  forget  one  of  the  table 
entries  it  knows.  We  can  easily  delete  a node  in  which  either  LLINK  or  RLINK  = A; 
but  when  both  subtrees  are  nonempty,  we  have  to  do  something  special,  since 
we  can’t  point  two  ways  at  once. 
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For  example,  consider  Fig.  10  again;  how  could  we  delete  the  root  node, 
CAPRICORN?  One  solution  is  to  delete  the  alphabetically  next  node,  which  always 
has  a null  LLINK,  then  reinsert  it  in  place  of  the  node  we  really  wanted  to  delete. 
For  example,  in  Fig.  10  we  could  delete  GEMINI,  then  replace  CAPRICORN  by 
GEMINI.  This  operation  preserves  the  essential  left-to-right  order  of  the  table 
entries.  The  following  algorithm  gives  a detailed  description  of  such  a deletion 
process. 

Algorithm  D (Tree  deletion).  Let  Q be  a variable  that  points  to  a node  of  a 
binary  search  tree  represented  as  in  Algorithm  T.  This  algorithm  deletes  that 
node,  leaving  a binary  search  tree.  (In  practice,  we  will  have  either  Q = ROOT  or 
Q = LLINK  (P)  or  RLINK(P)  in  some  node  of  the  tree.  This  algorithm  resets  the 
value  of  Q in  memory,  to  reflect  the  deletion.) 

Dl.  [Is  RLINK  null?]  Set  T 4-  Q.  If  RLINK(T)  = A,  set  Q 4-  LLINK (T)  and  go 
to  D4.  (For  example,  if  Q = RLINK  (P)  for  some  P,  we  would  set  RLINK  (P)  4— 
LLINK  (T).) 

D2.  [Find  successor.]  Set  R 4-  RLINK  (T).  If  LLINK  (R)  = A,  set  LLINK  (R)  4- 
LLINK(T),  Q 4—  R,  and  go  to  D4. 

D3.  [Find  null  LLINK.]  Set  S 4-  LLINK (R).  Then  if  LLINK(S)  / A,  set  R 4—  S 
and  repeat  this  step  until  LLINK (S)  = A.  (At  this  point  S will  be  equal 
to  Q$,  the  symmetric  successor  of  Q.)  Finally,  set  LLINK(S)  4-  LLINK (T), 
LLINK (R)  4-  RLINK (S),  RLINK (S)  4-  RLINK (T),  Q 4-  S. 

D4.  [Free  the  node.]  Set  AVAIL  4=  T,  thus  returning  the  deleted  node  to  the  free 
storage  pool.  | 

The  reader  may  wish  to  try  this  algorithm  by  deleting  AQUARIUS,  CANCER, 
and  CAPRICORN  from  Fig.  10;  each  case  is  slightly  different.  An  alert  reader  may 
have  noticed  that  no  special  test  has  been  made  for  the  case  RLINK  (T)  ^ A, 
LLINK  (T)  = A;  we  will  defer  the  discussion  of  this  case  until  later,  since  the 
algorithm  as  it  stands  has  some  very  interesting  properties. 

Since  Algorithm  D is  quite  unsymmetrical  between  left  and  right,  it  stands 
to  reason  that  a sequence  of  deletions  will  make  the  tree  get  way  out  of  balance, 
so  that  the  efficiency  estimates  we  have  made  will  be  invalid.  But  deletions  don’t 
actually  make  the  trees  degenerate  at  all! 

Theorem  H (T.  N.  Hibbard,  1962).  After  a random  element  is  deleted  from  a 
random  tree  by  Algorithm  D,  the  resulting  tree  is  still  random. 

[Nonmathematical  readers,  please  skip  to  (io).]  This  statement  of  the  theo- 
rem is  admittedly  quite  vague.  We  can  summarize  the  situation  more  precisely 
as  follows:  Let  T be  a tree  of  n elements,  and  let  P(T)  be  the  probability  that 
T occurs  if  its  keys  are  inserted  in  random  order  by  Algorithm  T.  Some  trees 
are  more  probable  than  others.  Let  Q(T)  be  the  probability  that  T will  occur  if 
n+1  elements  are  inserted  in  random  order  by  Algorithm  T and  then  one  of  these 
elements  is  chosen  at  random  and  deleted  by  Algorithm  D.  In  calculating  P(T), 
we  assume  that  the  n!  permutations  of  the  keys  are  equally  likely;  in  calculating 
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Q(T),  we  assume  that  the  (n  + 1)!  (n  4-  1)  permutations  of  keys  and  selections 
of  the  doomed  key  are  equally  likely.  The  theorem  states  that  P(T)  = Q(T) 
for  all  T. 


Proof.  We  are  faced  with  the  fact  that  permutations  are  equally  probable,  not 
trees,  and  therefore  we  shall  prove  the  result  by  considering  permutations  as  the 
random  objects.  We  shall  define  a deletion  from  a permutation,  and  then  we 
will  prove  that  “a  random  element  deleted  from  a random  permutation  leaves  a 
random  permutation.” 

Let  ai  a2  . . . a„+1  be  a permutation  of  {1,2,...,  n+1);  we  want  to  define  the 
operation  of  deleting  a*,  so  as  to  obtain  a permutation  bk  62  ■ . . bn  of  {1, 2, . . . , n}. 
This  operation  should  correspond  to  Algorithms  T and  D,  so  that  if  we  start 
with  the  tree  constructed  from  the  sequence  of  insertions  ai,  a2,  ■ . . , an+1  and 
delete  at,  renumbering  the  keys  from  1 to  n,  we  obtain  the  tree  constructed  from 
bi  b2  . . . bn. 

It  is  not  hard  to  define  such  a deletion  operation.  There  are  two  cases: 

Case  1:  = n + 1,  or  a,  + 1 = a}  for  some  j < i.  (This  is  essentially  the 

condition  “RLINK(a,)  = A.”)  Remove  a,  from  the  sequence,  and  subtract  unity 
from  each  element  greater  than  a,. 


Case  2:  a*  -f  1 — aj  for  some  j > i.  Replace  ai  by  aj,  remove  a}  from  its 
original  place,  and  subtract  unity  from  each  element  greater  than  a;. 

For  example,  suppose  we  have  the  permutation  4 6 1 3 5 2.  If  we  circle  the 
element  to  be  deleted,  we  have 


@61352=45132 

4 6 1 (3)  5 2 

401352=41352 

4 6 1 3@2 

4 6@  3 5 2 = 3 5 1 2 4 

4 6 1 3 5 @ 

3 5 14  2 

4 5 13  2 
3 5 12  4 


Since  there  are  (n  + 1)!  (n  + 1)  possible  deletion  operations,  the  theorem  will  be 
established  if  we  can  show  that  every  permutation  of  {1,2, ...  ,n}  is  the  result 
of  exactly  (n  + l)2  deletions. 

Let  bi  62  ...  bn  be  a permutation  of  {1, 2, . . . , n}.  We  shall  define  (n  + l)2 
deletions,  one  for  each  pair  i,  j with  1 < *,  j < n + 1,  as  follows: 

If  * < j,  the  deletion  is 


K ■■■  K- 1 @ K+i  ■ ■ ■ b'j_  1 (&i+l)  b'j  ...  b'n.  (7) 

Here,  as  below,  b'k  stands  for  either  bk  or  bk  + 1,  depending  on  whether  or  not 
bk  is  less  than  the  circled  element.  This  deletion  corresponds  to  Case  2. 

If  i > j,  the  deletion  is 


this  deletion  fits  the  definition  of  Case  1. 

Finally,  if  i = j,  we  have  another  Case  1 deletion,  namely 

b'i  - ■ ■ b'i_1  (n+i)  b'j. . ,b'n. 


(8) 


(9) 
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As  an  example,  let  n = 4 and  consider  the  25  deletions  that  map  into  3 14  2: 


i = 1 

i = 2 

i = 3 

i = 4 

i = 5 

j = 1 ©3142 

4 © 1 5 

2 

4 

1 © 5 2 

4 

1 5©2 

4 1 5 2© 

j = 2 ©4  1 5 2 

3 © 1 4 

2 

4 

205  3 

4 

2 503 

4 2 5 3© 

3 = 3 ©1452 

4 © 2 5 

3 

3 

1 © 4 2 

3 

1 5 © 2 

3 1 5 2© 

j = 4 © 1 5 4 2 

4©5  2 

3 

3 

1 © 5 2 

3 

1 4 © 2 

4 1 5 3© 

3= 5 ©1524 

4©5  3 

2 

3 

1@2  5 

4 

1 5©3 

3 1 4 2© 

The  circled  element  is  always 

in  position  i, 

and  for  fixed  i 

we  have  con- 

structed  n+ 1 different  deletions,  one  for  each  j:  hence  (n+ 1)2  different  deletions 
have  been  constructed  for  each  permutation  b1b2.  ■■  bn.  Since  only  (n  + l)2n! 
deletions  are  possible,  we  must  have  found  all  of  them.  | 

The  proof  of  Theorem  H not  only  tells  us  about  the  result  of  deletions,  it 
also  helps  us  analyze  the  running  time  in  an  average  deletion.  Exercise  12  shows 
that  we  can  expect  to  execute  step  D2  slightly  less  than  half  the  time,  on  the 
average,  when  deleting  a random  element  from  a random  table. 

Let  us  now  consider  how  often  the  loop  in  step  D3  needs  to  be  performed: 
Suppose  that  we  are  deleting  a node  on  level  l,  and  that  the  external  node 
immediately  following  in  symmetric  order  is  on  level  k.  For  example,  if  we  are 
deleting  CAPRICORN  from  Fig.  10,  we  have  l = 0 and  k = 3 since  node  [T]  is  on 
level  3.  If  k = l + 1,  we  have  RLINK(T)  = A in  step  Dl;  and  if  k > l + 1,  we  will 
set  S 4—  LLINK(R)  exactly  k — l — 2 times  in  step  D3.  The  average  value  of  l is 
(internal  path  length)/iV;  the  average  value  of  k is 

(external  path  length  - distance  to  leftmost  external  node) /N. 

The  distance  to  the  leftmost  external  node  is  the  number  of  left-to-right  minima 
in  the  insertion  sequence,  so  it  has  the  average  value  Hn  by  the  analysis  of 
Section  1.2.10.  Since  external  path  length  minus  internal  path  length  is  2 N,  the 
average  value  of  k — l — 2 is  —H^/N.  Adding  to  this  the  average  number  of 
times  that  k — l — 2 is  —1,  we  see  that  the  operation  S 4—  LLINK(R)  in  step  D3 
is  performed  only 

5 + (I  “ Hn)/N  (io) 

times,  on  the  average,  in  a random  deletion.  This  is  reassuring,  since  the  worst 
case  can  be  pretty  slow  (see  exercise  11). 

Although  Theorem  H is  rigorously  true,  in  the  precise  form  we  have  stated  it, 
it  cannot  be  applied,  as  we  might  expect,  to  a sequence  of  deletions  followed 
by  insertions.  The  shape  of  the  tree  is  random  after  deletions,  but  the  relative 
distribution  of  values  in  a given  tree  shape  may  change,  and  it  turns  out  that  the 
first  random  insertion  after  deletion  actually  destroys  the  randomness  property 
on  the  shapes.  This  startling  fact,  first  observed  by  Gary  Knott  in  1972,  must 
be  seen  to  be  believed  (see  exercise  15).  Even  more  startling  is  the  empirical 
evidence  gathered  by  J.  L.  Eppinger  [CACM  26  (1983),  663-669,  27  (1984), 
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235],  who  found  that  the  path  length  decreases  slightly  when  a few  random 
deletions  and  insertions  are  made,  but  then  it  increases  until  reaching  a steady 
state  after  about  N 2 deletion/insertion  operations  have  been  performed.  This 
steady  state  is  worse  than  the  behavior  of  a random  tree,  when  N is  greater  than 
about  150.  Further  study  by  Culberson  and  Munro  [Comp.  J.  32  (1989),  68-75; 
Algorithmica  5 (1990),  295-311]  has  led  to  a plausible  conjecture  that  the  average 
search  time  in  the  steady  state  is  asymptotically  y/2N/9n.  However,  Eppinger 
also  devised  a simple  modification  that  alternates  between  Algorithm  D and  a 
left-right  reflection  of  the  same  algorithm;  he  found  that  this  leads  to  an  excellent 
steady  state  in  which  the  path  length  is  reduced  to  about  88%  of  its  normal  value 
for  random  trees.  A theoretical  explanation  for  this  behavior  is  still  lacking 

As  mentioned  above,  Algorithm  D does  not  test  for  the  case  LLINK(T)  = A, 
although  this  is  one  of  the  easy  cases  for  deletion.  We  could  add  a new  step 
between  D1  and  D2,  namely, 

D1.5.  [Is  LLINK  null?]  If  LLINK(T)  = A,  set  Q <-  RLINK(T)  and  go  to  D4. 

Exercise  14  shows  that  Algorithm  D with  this  extra  step  always  leaves  a tree 
that  is  at  least  as  good  as  the  original  Algorithm  D,  in  the  path-length  sense,  and 
sometimes  the  result  is  even  better.  When  this  idea  is  combined  with  Eppinger’s 
symmetric  deletion  strategy,  the  steady-state  path  length  for  repeated  random 
deletion/insertion  operations  decreases  to  about  86%  of  its  insertion-only  value. 

Frequency  of  access.  So  far  we  have  assumed  that  each  key  was  equally  likely 
as  a search  argument.  In  a more  general  situation,  let  pk  be  the  probability  that 
we  will  search  for  the  fcth  element  inserted,  where  Pi  + ■ ■ ■ + Pn  = 1.  Then  a 
straightforward  modification  of  Eq.  (2),  if  we  retain  the  assumption  of  random 
order  so  that  the  shape  of  the  tree  stays  random  and  Eq.  (5)  holds,  shows  that 
the  average  number  of  comparisons  in  a successful  search  will  be 

N N 

l + YJPk{ZHk-2)  = 2Y,PkHk-l.  (11) 

fc=i  fc=i 

For  example,  if  the  probabilities  obey  Zipf’s  law,  Eq.  6.1— (8),  the  average 
number  of  comparisons  reduces  to 

Hn  ~ 1 + 11$/. Hn  (12) 

if  we  insert  the  keys  in  decreasing  order  of  importance.  (See  exercise  18.)  This 
is  about  half  as  many  comparisons  as  predicted  by  the  equal-frequency  analysis, 
and  it  is  fewer  than  we  would  make  using  binary  search. 

Figure  12  shows  the  tree  that  results  when  the  most  common  31  words  of 
English  are  entered  in  decreasing  order  of  frequency.  The  relative  frequency  is 
shown  with  each  word,  using  statistics  from  Cryptanalysis  by  H.  F.  Gaines  (New 
York:  Dover,  1956),  226.  The  average  number  of  comparisons  for  a successful 
search  in  this  tree  is  4.042;  the  corresponding  binary  search,  using  Algorithm 
6.2. IB  or  6.2.1C,  would  require  4.393  comparisons. 
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Fig.  12.  The  31  most  common  English  words,  inserted  in  decreasing  order  of  frequency. 

Optimum  binary  search  trees.  These  considerations  make  it  natural  to  ask 
about  the  best  possible  tree  for  searching  a table  of  keys  with  given  frequencies. 
For  example,  the  optimum  tree  for  the  31  most  common  English  words  is  shown 
in  Fig.  13;  it  requires  only  3.437  comparisons  for  an  average  successful  search. 

Let  us  now  explore  the  problem  of  finding  the  optimum  tree.  When  N = 3, 
for  example,  let  us  assume  that  the  keys  Ki  < K2  < K3  have  respective 
probabilities  p,  q,  r.  There  are  five  possible  trees: 


I II  III  IV  V 


Figure  14  shows  the  ranges  ofp,  q,  r for  which  each  tree  is  optimum;  the  balanced 
tree  is  best  about  45  percent  of  the  time,  if  we  choose  p , q , r at  random  (see 
exercise  21). 

Unfortunately,  when  N is  large  there  are 

binary  trees,  so  we  can’t  just  try  them  all  and  see  which  is  best.  Let  us  therefore 
study  the  properties  of  optimum  binary  search  trees  more  closely,  in  order  to 
discover  a better  way  to  find  them. 
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Fig.  13.  An  optimum  search  tree  for  the  31  most  common  English  words. 


(1,0,0) 


(0,0,1) 


Fig.  14.  If  the  relative  frequencies  of  (Ki,K2,K3)  are  (p,  q,  r),  this  graph  shows  which 
of  the  five  trees  in  (13)  is  best.  The  fact  that  p + q + r = 1 makes  the  graph  two- 
dimensional  although  there  are  three  coordinates. 


So  far  we  have  considered  only  the  probabilities  for  a successful  search;  in 
practice,  the  unsuccessful  case  must  usually  be  considered  as  well.  For  example, 
the  31  words  in  Fig.  13  account  for  only  about  36  percent  of  typical  English  text; 
the  other  64  percent  will  certainly  influence  the  structure  of  the  optimum  search 
tree. 

Therefore  let  us  set  the  problem  up  in  the  following  way:  We  are  given  2n  + 1 
probabilities  Pi,P2,---,Pn  and  qo,qi,  ■ ■ ■ ,qn,  where 

Pi  = probability  that  Ki  is  the  search  argument; 

qi  — probability  that  the  search  argument  lies  between  Ki  and  Kt+i. 

(By  convention,  qo  is  the  probability  that  the  search  argument  is  less  than  K\, 
and  qn  is  the  probability  that  the  search  argument  is  greater  than  Kn.)  Thus, 
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Pi  + P2  + • • • + pn  + 9o  + 9i  + • • • + qn  = 1,  and  we  want  to  find  a binary  tree 
that  minimizes  the  expected  number  of  comparisons  in  the  search,  namely 

n n 

5>0evel(©)  + 1)  + ^9fc  level  ([F]),  (14) 

3=1  k= 0 

where  (J)  is  the  jth  internal  node  in  symmetric  order  and  [F]  is  the  ( k + l)st 
external  node,  and  where  the  root  has  level  zero.  Thus  the  expected  number  of 
comparisons  for  the  binary  tree 


is  2q0  + 2 pi  + Sqi  + 3 p2  + 3 g2  + P3  + 93-  Let  us  call  this  the  cost  of  the  tree;  and 
let  us  say  that  a minimum-cost  tree  is  optimum.  In  this  definition  there  is  no 
need  to  require  that  the  p’s  and  q' s sum  to  unity;  we  can  ask  for  a minimum-cost 
tree  with  any  given  sequence  of  “weights”  (px, . . . ,pn;  q0, . . . , qn). 

We  have  studied  Huffman’s  procedure  for  constructing  trees  with  minimum 
weighted  path  length,  in  Section  2. 3. 4. 5;  but  that  method  requires  all  the  p’s  to 
be  zero,  and  the  tree  it  produces  will  usually  not  have  the  external  node  weights 
(9o,  • ■ • , qn)  in  the  proper  symmetric  order  from  left  to  right.  Therefore  we  need 
another  approach. 

What  saves  us  is  that  all  subtrees  of  an  optimum  tree  are  optimum.  For 
example,  if  (15)  is  an  optimum  tree  for  the  weights  ( Pi,P2,P3 ; 90,91,92,93), 
then  the  left  subtree  of  the  root  must  be  optimum  for  (pi,p2‘,  90,91,92);  any 
improvement  to  a subtree  leads  to  an  improvement  in  the  whole  tree. 

This  principle  suggests  a computation  procedure  that  systematically  finds 
larger  and  larger  optimum  subtrees.  We  have  used  much  the  same  idea  in  Sec- 
tion 5.4.9  to  construct  optimum  merge  patterns;  the  general  approach  is  known 
as  “dynamic  programming,”  and  we  shall  consider  it  further  in  Section  7.7. 

Let  c(i,j)  be  the  cost  of  an  optimum  subtree  with  weights  (pi+1, . . . ,/q; 

9i,  • • • , 9j);  and  let  w(i,j)  = pi+1  -\ h Pj  +qi~\ b qj  be  the  sum  of  all  those 

weights;  thus  c(i,j)  and  w(i,  j)  are  defined  for  0 < i < j < n.  It  follows  that 

c(i,*)  = 0, 

c(i,j)  = w(i,j)+win  {c(i,k-l)  + c(k,j)),  for  i < j,  (16) 

since  the  minimum  possible  cost  of  a tree  with  root  (k)  is  w(i,j)  + c(i,  k-1)  + 
c(k,j).  When  i < j,  let  R(i,  j)  be  the  set  of  all  k for  which  the  minimum  is 
achieved  in  (16);  this  set  specifies  the  possible  roots  of  the  optimum  trees. 

Equation  (16)  makes  it  possible  to  evaluate  c(i,j)  for  j - i — 1,2,  ...,n; 
there  are  about  |n2  such  values,  and  the  minimization  operation  is  carried  out 
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for  about  |n3  values  of  k.  This  means  we  can  determine  an  optimum  tree  in 
0(n3)  units  of  time,  using  0(n2)  cells  of  memory. 

A factor  of  n can  actually  be  removed  from  the  running  time  if  we  make 
use  of  a monotonicity  property.  Let  r(i,j)  denote  an  element  of  R(i,j);  we  need 
not  compute  the  entire  set  R(i,j),  a single  representative  is  sufficient.  Once  we 
have  found  r(i,  j — 1)  and  r(i+ 1,  j),  the  result  of  exercise  27  proves  that  we  may 
always  assume  that 

< r{i,j)  < r(*+l,  j)  (i7) 

when  the  weights  are  nonnegative.  This  limits  the  search  for  the  minimum,  since 
only  r(i+l,  j)  — r(i,  j — 1)  + 1 values  of  k need  to  be  examined  in  (16)  instead  of 
j — i.  The  total  amount  of  work  when  j — i = d is  now  bounded  by  the  telescoping 
series 

(r(i+ 1,  j ) - r(i,  j- 1)  + 1)  = r(n-d+ 1,  n)  - r(0,  d- 1)  + n - d + 1 < 2n; 

d<j<n 
i=3  ~d 

hence  the  total  running  time  is  reduced  to  0(n2). 

The  following  algorithm  describes  this  procedure  in  detail. 

Algorithm  K ( Find  optimum  binary  search  trees).  Given  2n  + 1 nonnegative 
weights  (pi, . . . ,pn;  q0, ... , qn),  this  algorithm  constructs  binary  trees  t(i,j)  that 
have  minimum  cost  for  the  weights  (pi+ j,  ...,pj]  ql,...,qJ)  in  the  sense  defined 
above.  Three  arrays  are  computed,  namely 

c[i,j],  for  0 < i < j < n,  the  cost  of  t(i,j); 

r[i,j],  for  0 < i < j < n,  the  root  of  t(i,j); 

w[i,j],  for  0 < i < j < n,  the  total  weight  of  t(i,j). 

The  results  of  the  algorithm  are  specified  by  the  r array:  If  i = j,  t(i,j)  is  null; 
otherwise  its  left  subtree  is  t(i,  r[i,j\—  1)  and  its  right  subtree  is  t(r[i,j],  j). 

Kl.  [Initialize.]  For  0 < i < n,  set  c[i,i]  0 and  w[i,i]  % and  w[i,j]  <- 
w[h  j~  1]  +Pj  + qj  for  j — i + 1,  . . . , n.  Then  for  1 < j < n set  c[j  — 1,  j]  •<— 
and  r[j~l,  j]  j.  (This  determines  all  the  1-node  optimum 

trees.) 

K2.  [Loop  on  d]  Do  step  K3  for  d — 2,  3,  . . . , n,  then  terminate  the  algorithm. 

K3.  [Loop  on  j.]  (We  have  already  determined  the  optimum  trees  of  fewer  than 
d nodes.  This  step  determines  all  the  ci-node  optimum  trees.)  Do  step  K4 
for  j = d,  d + 1,  . . . , n. 

K4.  [Find  c[i,j],  r[i,j ].]  Set  i <-  j - d.  Then  set 

c[i,  j]  4-  w[i,j]  + minr[ijJ_1]<fe<r[i+l  j](c[i,  fc-1]  +c[k,j]), 

and  set  r[i,  j]  to  a value  of  k for  which  the  minimum  occurs.  (Exercise  22 
proves  that  r[i,  j — 1]  < r[i+l,  j].)  | 

As  an  example  of  Algorithm  K,  consider  Fig.  15,  which  is  based  on  a “key- 
word-in-context” (KWIC)  indexing  application.  The  titles  of  all  articles  in  the 
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first  ten  volumes  of  the  Journal  of  the  ACM  were  sorted  to  prepare  a concordance 
in  which  there  was  one  line  for  every  word  of  every  title.  However,  certain  words 
like  “THE”  and  “EQUATION”  were  felt  to  be  sufficiently  uninformative  that  they 
were  left  out  of  the  index.  These  special  words  and  their  frequency  of  occurrence 
are  shown  in  the  internal  nodes  of  Fig.  15.  Notice  that  a title  such  as  “On  the 
solution  of  an  equation  for  a certain  new  problem”  would  be  so  uninformative, 
it  wouldn’t  appear  in  the  index  at  all!  The  idea  of  KWIC  indexing  is  due  to 
H.  P.  Luhn,  Amer.  Documentation  11  (1960),  288-295.  (See  W.  W.  Youden, 
JACM  10  (1963),  583-646,  where  the  full  KWIC  index  appears.) 


to 

to 

<N 


Fig.  15.  An  optimum  binary  search  tree  for  a KWIC  indexing  application. 

When  preparing  a KWIC  index  file  for  sorting,  we  might  want  to  use  a 
binary  search  tree  in  order  to  test  whether  or  not  each  particular  word  is  to  be 
indexed.  The  other  words  fall  between  two  of  the  unindexed  words,  with  the 
frequencies  shown  in  the  external  nodes  of  Fig.  15;  thus,  exactly  277  words  that 
are  alphabetically  between  “PROBLEMS”  and  “SOLUTION”  appeared  in  the  JACM 
titles  during  1954-1963. 

Figure  15  shows  the  optimum  tree  obtained  by  Algorithm  K,  with  n — 35. 
The  computed  values  of  r[0,  j]  for  j = 1,  2,  . . . , 35  are  (1, 1, 2, 3, 3, 3, 3, 8, 8, 8, 
8, 8, 8, 11, 11, ... , 11, 21, 21, 21, 21, 21, 21);  the  values  of  r[*,35]  for  i = 0,  1,  . . . , 34 
are  (21,  21, ... , 21, 25,  25,  25, 25,  25,  25,  26,  26,  26, 30, 30,  30, 30, 30,  30,  30,  33,  33, 
33,35,35). 

The  “betweenness  frequencies”  qj  have  a noticeable  effect  on  the  optimum 
tree  structure;  Fig.  16(a)  shows  the  optimum  tree  that  would  have  been  obtained 
with  the  qj  set  to  zero.  Similarly,  the  internal  frequencies  pi  are  important; 
Fig.  16(b)  shows  the  optimum  tree  when  the  are  set  to  zero.  Considering  the 
full  set  of  frequencies,  the  tree  of  Fig.  15  requires  only  4.15  comparisons,  on  the 
average,  while  the  trees  of  Fig.  16  require,  respectively,  4.69  and  4.72. 
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a) 


Fig.  16.  Optimum  binary  search  trees  based  on  half  of  the  data  of  Fig.  15:  (a)  external 
frequencies  suppressed;  (b)  internal  frequencies  suppressed. 


Since  Algorithm  K requires  time  and  space  proportional  to  n2,  it  becomes 
impractical  when  n is  very  large.  Of  course  we  may  not  really  want  to  use  binary 
search  trees  for  large  n,  in  view  of  the  other  search  techniques  to  be  discussed 
later  in  this  chapter;  but  let’s  assume  anyway  that  we  want  to  find  an  optimum 
or  nearly  optimum  tree  when  n is  large. 

We  have  seen  that  the  idea  of  inserting  the  keys  in  order  of  decreasing 
frequency  can  tend  to  make  a fairly  good  tree,  on  the  average;  but  it  can  also  be 
very  bad  (see  exercise  20),  and  it  is  not  usually  very  near  the  optimum,  since  it 
makes  no  use  of  the  qj  weights.  Another  approach  is  to  choose  the  root  k so  that 
the  resulting  maximum  subtree  weight,  max(u;(0,  fc-1),  w(k,n)),  is  as  small  as 
possible.  This  approach  can  also  be  fairly  poor,  because  it  may  choose  a node 
with  very  small  pk  to  be  the  root;  however,  Theorem  M below  shows  that  the 
resulting  tree  will  not  be  extremely  far  from  the  optimum. 
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A more  satisfactory  procedure  can  be  obtained  by  combining  these  two 
methods,  as  suggested  by  W.  A.  Walker  and  C.  C.  Gotlieb  [Graph  Theory  and 
Computing  (Academic  Press,  1972),  303-323]:  Try  to  equalize  the  left-hand  and 
right-hand  weights,  but  be  prepared  to  move  the  root  a few  steps  to  the  left  or 
right  to  find  a node  with  relatively  large  pk.  Figure  17  shows  why  this  method  is 
reasonable:  If  we  plot  c(0,  k- 1)  + c(k,  n)  as  a function  of  k,  for  the  KWIC  data 
of  Fig.  15,  we  see  that  the  result  is  quite  sensitive  to  the  magnitude  of  pk. 

A top-down  method  such  as  this  can  be  used  for  large  n to  choose  the  root 
and  then  to  work  on  the  left  and  the  right  subtrees.  When  we  get  down  to 
a sufficiently  small  subtree  we  can  apply  Algorithm  K.  The  resulting  method 
yields  fairly  good  trees  (reportedly  within  2 or  3 percent  of  the  optimum),  and  it 
requires  only  0(n)  units  of  space,  0(n  log  n)  units  of  time.  In  fact,  M.  Fredman 
has  shown  that  0(n ) units  of  time  suffice,  if  suitable  data  structures  are  used 
[STOC  7 (1975),  240-244];  see  K.  Mehlhorn,  Data  Structures  and  Algorithms  1 
(Springer,  1984),  Section  4.2. 

Optimum  trees  and  entropy.  The  minimum  cost  is  closely  related  to  a 
mathematical  concept  called  entropy,  which  was  introduced  by  Claude  Shannon 
in  his  seminal  work  on  information  theory  [Bell  System  Tech.  J.  27  (1948),  379- 
423,  623-656],  Ifpi,p2,  •••  , pn  are  probabilities  with  pi  -\~P2  + • • • -f - pn  — 1,  we 
define  the  entropy  H(pi,p2, . . . ,pn)  by  the  formula 

n 1 

H(pi,p2,...,pn)  = YV  lg— • (18) 

ti  P* 

Intuitively,  if  n events  are  possible  and  the  A;th  event  occurs  with  probability  pk, 
we  can  imagine  that  we  have  received  lg(l/p*.)  bits  of  information  when  the  kth 
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event  has  occurred.  (An  event  of  probability  ^ gives  5 bits  of  information,  etc.) 
Then  H (pi , p2, . . . , pn)  is  the  expected  number  of  bits  of  information  in  a random 
event.  If  pk  = 0,  we  define  pk  lg(l/pjb)  = 0,  because 

lim  e lg  - = lim  — lg  m = 0. 
e-*0+  e m— > oo  m 


This  convention  allows  us  to  use  (18)  when  some  of  the  probabilities  are  zero. 

The  function  x\g(\/x)  is  concave;  that  is,  its  second  derivative,  —\/{x  In  2), 
is  negative.  Therefore  the  maximum  value  of  H(pi,p2, . . . ,pn)  occurs  when 
Pi  = Pi  — ■ ■ ■ = Pn  = 1/n,  namely 


H 


'1  1 

<n  n 


(19) 


In  general,  if  we  specify  pi,  . . . , pn-k  but  allow  the  other  probabilities  pn-fc+i, 
. . . , pn  to  vary,  we  have 


H(Pl ) • • • i Pn—kiPn—k+ 1 > • • • > Pn)  5;  H , . . . , Pn— fcj  > • • • j ^ ^ 

= F(pi,...,pn_fc,g)  + glgA:,  (20) 

■ • • ,Pn-fc,Pn-fc+l,  • • • ! Pn)  > H(pi,  . . . ,pn-k,q,  0,  . . . , 0) 

= H{pi,---,pn~k,q),  (21) 


where  q = 1 - (pi  + ■ • • + p„-fc). 

Consider  any  not-necessarily-binary  tree  in  which  probabilities  have  been 
assigned  to  the  leaves,  say 


P2  P3  P4 


Here  pk  represents  the  probability  that  a search  procedure  will  end  at  leaf  [T] . 
Then  the  branching  at  each  internal  (nonleaf)  node  corresponds  to  a local  prob- 
ability distribution  based  on  the  sums  of  leaf  probabilities  below  each  branch. 
For  example,  at  node  (a)  the  first,  second,  and  third  branches  are  taken  with 
the  respective  probabilities 


(pi  + P2  + P3  + Pi,  P5,  P6  + P7  + P8  + P9), 
and  at  node  (5)  the  probabilities  are 

(Pl,P2,P3  +P4)/(Pl  +P2  +P3  +P4)- 
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Let  us  say  that  each  internal  node  has  the  entropy  of  its  local  probability 
distribution;  thus 

1 


H(A)  = (P1+P2+P3+P4)  lg  - 


’Pl+P2+P3+P4 
+ P5  lg b (P6+P7+P8+P9)  lg 


1 


P5 


H(B)  = 


Pi 


P1+P2+P3+P4 
P3+P4 


, P1+P2+P3+P4  , 

lg + 


P6+P7+P8+P9  ’ 
P2 


+ 


P1+P2+P3+P4 


lg 


Pi  P1+P2+P3+P4 

P1+P2+P3+P4 
P3+P4 


lg 


P1+P2+P3+P4 

P2 


H(C)  = ^lg^. 

P 2 P 2 


H(D) 

H{E) 


P3 


-lg^i  + 


P4  , P3+P4 


P3+P4  P 3 P3+P4 


lg 


P6 


P6+P7+P8+P9 
PS 


P4 


, P6+P7+P8+P9  . 

lg b 

P6 


P7 


+ 


P6+P7+P8+P9 


, P6+P7+P8+P9  , 

lg b 

Ps 


P6+P7+P8+P9 
P9 


lg 


P6+P7+P8+P9 


P6+P7+P8+P9 


lg 


P7 

P6+P7+P8+P9 

P9 


Lemma  E.  The  sum  of  p(a)H(a)  over  all  internal  nodes  a of  a tree,  where 
p(a)  is  the  probability  of  reaching  node  a and  H(a ) is  the  entropy  of  a,  equals 
the  entropy  of  the  probability  distribution  on  the  leaves. 

Proof.  It  is  easy  to  establish  this  identity  by  induction  from  bottom  to  top.  For 
example,  we  have 

H(A)+(pi+p2+P3+p4)H(B)+p2H(C)+(p3+p4)H(D)+(p6+pr+p8+p9)H(E) 

= pi  lg  — + p2  lg  — + • • • + p9  lg  — 

Pi  P 2 P9 

with  respect  to  the  formulas  above;  all  terms  involving  lg(px  + P2  + P3  + P4), 
lg(P3  + P4))  and  lg(p6  + P7  + P8  + P9)  cancel  out.  | 

As  a consequence  of  Lemma  E,  we  can  use  entropy  to  establish  a convenient 
lower  bound  on  the  cost  of  any  binary  tree. 

Theorem  B.  Let  (pi, . . . ,pn;  <7o , - - - , qn)  be  nonnegative  weights  as  in  Algo- 
rithm K,  normalized  so  that  pH b p„+qo  H \-qn  = 1,  and  let  P = px  H bp„ 

be  the  probability  of  a successful  search.  Let 

bf  H(pi,  ■ • • >Pn>  Qo : • • • t ^n) 

be  the  entropy  of  the  corresponding  probability  distribution,  and  let  C be  the 
minimum  cost,  (14).  Then  if  H > 2 P/e  we  have 

eH 
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Proof.  Take  a binary  tree  of  cost  C and  assign  the  probabilities  qk  to  its  leaves. 
Also  add  a middle  branch  below  each  internal  node,  leading  to  a new  leaf  that 
has  probability  pk . Then  C = ^p(a),  summed  over  the  internal  nodes  a of  the 
resulting  ternary  tree,  and  H = ^2  p(a)  H (a)  by  Lemma  E. 

The  entropy  H(a)  corresponds  to  a three-way  distribution,  where  one  of  the 
probabilities  is  Pj/p(ct)  if  a is  internal  node  (J).  Exercise  35  proves  that 


H(p,q,r)  < plga;+  1 + lg(l  + 

for  all  x > 0,  whenever  p + q + r = 1.  Therefore  we  have  the  inequality 

H = Y^P(a)H(a)  < Y^Pj^x  + (1  + !g(1  + ^))c 


(24) 


3= 1 


for  all  positive  x.  Choosing  2 x — H/P  now  leads  to  the  desired  result,  since 


C > 


1 


> H 


1 + lg(l  + P/H) 
1 

l+lg(l  + P/H) 
eH 

Pig 


(h-p  lg 


H_ 

2 PJ 


(H  + P lge) 


lg 


eH 


1 + lg(l  + P/H)  & 2 P 


2 P’ 


using  the  fact  that  lg(l  + y)  < y lg  e for  all  y > 0.  | 

Equation  (23)  does  not  necessarily  hold  when  the  entropy  is  extremely  low. 
But  the  restriction  to  cases  where  H > 2 P/e  is  not  severe,  since  the  value  of  H is 
usually  near  lgn;  see  exercise  37.  Notice  that  the  proof  doesn’t  actually  use  the 
left-to-right  order  of  the  nodes;  the  lower  bound  (23)  holds  for  any  binary  search 
tree  that  has  internal  node  probabilities  pj  and  external  node  probabilities  qk  in 
any  order. 

Entropy  calculations  also  yield  an  upper  bound  that  is  not  too  far  from  (23), 
even  when  we  do  stick  to  the  left-to-right  order: 


Theorem  M.  Under  the  assumptions  of  Theorem  B,  we  also  have 

C < H + 2-P. 


(25) 


Proof  Form  the  n+1  sums  s0  = \qo,  «i  = qo+Pi  + ^qi,  s2  = qo+Pi+Qi  +P2+^Q'2, 

. . . , sn  = go  +Pi  4 h qn- 1 +Pn  + |<7n;  we  may  assume  that  s0  < «i  < • • • < sn 

(see  exercise  38).  Express  each  Sk  as  a binary  fraction,  writing  sn  = (.111 . . . )2 
if  sn  = 1.  Then  let  the  string  Ok  be  the  leading  bits  of  Sk,  retaining  just  enough 
bits  to  distinguish  Sk  from  s3  for  j / k.  For  example,  we  might  have  n = 3 and 


Sq  = (.0000001)2 

<7o  = 00000 

Si  = (.0000101)2 

0 1 = 00001 

s2  = (.0001011)2 

02  = 0001 

S3  = (.1100000)2 

03  = 1 
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Construct  a binary  tree  with  n + 1 leaves,  in  such  a way  that  ok  corresponds  to 
the  path  from  the  root  to  [¥]  for  0 < k < n,  where  ‘O’  denotes  a left  branch 
and  T denotes  a right  branch.  Also,  if  crfe_1  has  the  form  ak0fjk  and  ak  has  the 
form  akl') k for  some  ak,  f3k)  and  jk,  let  the  internal  node  (k)  correspond  to  the 
path  ak.  Thus  we  would  have 


in  the  example  above.  There  may  be  some  internal  nodes  that  are  still  nameless; 
replace  each  of  them  by  their  one  and  only  child.  The  cost  of  the  resulting  tree 
is  at  most  ELif^KI  + 1)  + £Lo  qkWk\- 

We  have 

Pk  < §<7/s-l  + Pk  + \<lk  = Sk  — sk~ i < 2_l“'tl,  (26) 

because  sk  < (.ak)2  + 2_lafcl  and  > (. ak)2 ■ Furthermore,  if  qk  > 2_t  we 
have  Sfc  > sk_i  + 2~t~1  and  sk+i  > sk  + 2~t_1,  hence  |cr*. | <4  + 1.  It  follows 
that  qk  < 2— and  we  have  constructed  a binary  tree  of  cost 

n n n n 

< ^Pk(i  + l«fel)  + 5Z9fckfc|  < ^Pfc(1  + 1s— ) + YJqk( 2 + ig— ) 

k= 1 fc= 0 k= 1 V Pk'  fc=0  V qk' 

= P + 2(1  - P)  + H = H + 2 - P.  | 

In  the  KWIC  indexing  application  of  Fig.  15,  we  have  P = 1304/3288  as 
0.39659,  and  H(pi, . . . , P35 , qo, . . . , q35)  as  5.00635.  Therefore  Theorem  B tells  us 
that  C > 3.3800,  and  Theorem  M tells  us  that  C < 6.6098. 

*The  Garsia— Wachs  algorithm.  An  amazing  improvement  on  Algorithm  K 
is  possible  in  the  special  case  that  Pi  = • • • = pn  = 0.  This  case,  in  which 
only  the  leaf  probabilities  (qo,qi,  ■ ■ ■ ,qn)  are  relevant,  is  especially  important 
because  it  arises  in  several  significant  applications.  Let  us  therefore  assume  in 
the  remainder  of  this  section  that  the  probabilities  pj  are  zero.  Notice  that 
Theorems  B and  M reduce  to  the  inequalities 

H{qo,qi,...,qn)  < C{q0,  qu  . . . , qn)  < H(q0,  qu  . . . , qn)  + 2 (27) 

in  this  case,  because  we  cannot  have  C = H + 2 - P unless  P — 1;  and  the  cost 
function  (14)  simplifies  to 

n 

C = ^ qklk,  lk  = the  level  of  (T] . 
k= 0 

A simpler  algorithm  is  possible  because  of  the  following  key  property: 


(28) 
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Lemma  W.  If  qk-i  > Qk+i  then  lk  < lk+i  in  every  optimum  tree.  If  qk-i  = 
qk+i  then  Ik  < lk+i  in  some  optimum  tree. 

Proof.  Suppose  qk-i  > Qk+i  and  consider  a tree  in  which  Ik  > h+i-  Then  |~F| 
must  be  a right  child,  and  its  left  sibling  L is  a subtree  of  weight  w > qk-i- 
Replace  the  parent  of  [fc]  by  L;  replace  |fc+l|  by  a node  whose  children  are  [fc] 
and  | fc+1  | ■ This  changes  the  overall  cost  by  —w  — qk(lk  — h+i  — 1)  + qk+i  < 
qk+ 1 — <?k- 1-  So  the  given  tree  was  not  optimum  if  qu-i  > Qk+i , and  an  optimum 
tree  has  been  transformed  into  another  optimum  tree  if  qk-i  = qk+i-  In  the 
latter  case  we  have  found  an  optimum  tree  in  which  Ik  = lk+i • I 

A deeper  analysis  of  the  structure  tells  us  considerably  more. 

Lemma  X.  Suppose  j and  k are  indices  such  that  j < k and  we  have 
i)  q{- 1 > qi+1  for  1 < i < k; 
h)  qk-i  < qk+i; 

iii)  qi  < qk- 1 + qk  for  j < i < k - 1;  and 

iv)  qj- i > qk- 1 +qk- 

Then  there  is  an  optimum  tree  in  which  lk-i  — h and  either 

a)  lj  — lk  - 1,  or 

b)  lj  = h and  [J]  is  a left  child. 

Proof.  By  reversing  left  and  right  in  Lemma  W,  we  see  that  (ii)  implies  the 
existence  of  an  optimum  tree  in  which  lk- 1 > lk ■ But  Lemma  W and  (i)  also 
imply  that  l\  < I2  < ■ ■ ■ < lk-  Therefore  lk- 1 = h- 

Suppose  ls  < lk  — 1 < ls+ 1 for  some  s with  j < s < k — 1.  Let  t be  the 
smallest  index  < k such  that  lt  — lk . Then  U — lk  — 1 for  s < i < t,  and 
[ s+1 1 is  a left  child;  possibly  s + 1 = t.  Furthermore  [T]  and  1 1+1  [ are  siblings. 
Replace  their  parent  by  1 1+1 1;  replace  [7]  by  | i+1 1 for  s < i < t;  and  replace 
the  external  node  [T]  by  an  internal  node  whose  children  are  [7]  and  | s+1 1 . 
This  change  increases  the  cost  by  < qa  — qt  — qt+i  < Qs  ~ Qk-i  ~ Qk,  so  it  is  an 
improvement  if  qs  < qk~i  + qk-  Therefore,  by  (iii),  lj  > /*,  — 1. 

We  still  have  not  used  hypothesis  (iv).  If  lj  = lk  and  [J]  is  not  a left 
child,  [J]  must  be  the  right  sibling  of  | j— 1 1 . Replace  their  parent  by  | j— 1 1; 
then  replace  leaf  [T]  by  | »— 1 1 for  j < i < k;  and  replace  the  external  node 
|T|  by  an  internal  node  whose  children  are  | k— 1 1 and  [fc].  The  cost  increases 
by  —qj-i+qk-i+qk  < 0,  so  we  obtain  an  optimum  tree  satisfying  (a)  or  (b).  | 

Lemma  Y.  Let  j and  k be  as  in  Lemma  X,  and  consider  the  modified  probabil- 
ities (q'o,...,  g;_j)  = (%,...,  Qj-I,qk-1  + 9fc,  qj,  ■ ■ ■ , qk- 2-  qk+ v ••.««)  Obtained 
by  removing  qk-i  and  qk  and  inserting  qk-i  + qk  after  qj-\.  Then 

C(q'o,  ■ • • , qJn-i)  < C(q0,...,qn)-(qk-i+qk).  (29) 

Proof.  It  suffices  to  show  that  any  optimum  tree  for  (qo, . . . ,q„)  can  be  trans- 
formed into  a tree  of  the  same  cost  in  which  | fc— 1 1 and  [fc]  are  siblings  and  the 
leaves  appear  in  the  permuted  order 

l~o]  • • • I j-i  I 1 fc— 1 1 ffc]  [71  • • ■ 1 fc— 2 1 |fc+i  | ...  [7].  (30) 


1000 


Fig.  18.  The  Garsia  Wachs  algorithm  applied  to  alphabetic  frequency  data:  Phases  1 and  2. 
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We  start  with  the  tree  constructed  in  Lemma  X.  If  it  is  of  type  (b),  we  simply 
rename  the  leaves,  sliding  |fc— 1|  and  [fc]  to  the  left  by  k — 1 — j places.  If  it  is 
of  type  (a),  suppose  la-\  — Ik  — 1 and  la  = Ik',  we  proceed  as  follows:  First  slide 
| k— 1 1 and  [fc]  left  by  k — 1 — s places;  then  replace  their  (new)  parent  by  | s— 1 1 ; 
finally  replace  [J]  by  a node  whose  children  are  | A:— 1 1 and  [fc] , and  replace  node 
[T|  by  | i- 1 | for  j < i < s.  | 

Lemma  Z.  Under  the  hypotheses  of  Lemma  Y,  equality  holds  in  (29). 

Proof.  Every  tree  for  (q'0, . . . , q'n_i)  corresponds  to  a tree  with  leaves  (30)  in 
which  the  two  out-of-order  leaf  nodes  \k- 1 1 and  [fc]  are  siblings.  Let  internal 
node  (x)  be  their  parent.  We  want  to  show  that  any  optimum  tree  of  that  type 
can  be  converted  to  a tree  of  the  same  cost  in  which  the  leaves  appear  in  normal 
order  [o]  . . . [n] . 

There  is  nothing  to  prove  if  j — k — 1.  Otherwise  we  have  q'i_1  > q'i+1  for 
j < i < k — 1,  because  q3-i  > qk-i  + qk  > qj • Therefore  by  Lemma  W we  have 
< Zj  < ■ • • < lk~ 2,  where  lx  is  the  level  of  (x)  and  li  is  the  level  of  [7]  for 
j < i < k — 1.  If  lx  = Ik- 2,  we  simply  slide  node  (x)  to  the  right,  replacing  the 
sequence  (x)  [J]  . . . | fc— 2 1 by  [J]  ...  | fc — 2 1 (x) ; this  straightens  out  the  leaves 
as  desired. 

Otherwise  suppose  la  — lx  and  la+ 1 > lx.  We  first  replace  (x)  |~J]  . . . [~s~| 
by  [J]  ■ ■ ■ 0 ©;  this  makes  / < la+i  < • ■ ■ < lk-2,  where  l = lx  + 1 is  the 
common  level  of  nodes  |fc-l|  and  [fc] . Finally  replace  nodes 

| fc— 1 1 [~fc]  | s+l  | ...  | fc — 2 1 

by  the  cyclically  shifted  sequence 

| s+1 1 ...  | fc — 2 1 |fc— 1 1 [fc] . 

Exercise  40  proves  that  these  changes  decrease  the  cost,  unless  Ik- 2 = l ■ But  the 
cost  cannot  decrease,  because  of  Lemma  Y.  Therefore  Ik- 2 = l,  and  the  proof  is 
complete.  | 

These  lemmas  show  that  the  problem  for  n + 1 weights  qo,  qi,  . . . , qn  can 
be  reduced  to  an  n-weight  problem:  We  first  find  the  smallest  index  fc  with 
qk- 1 < qk+ 1;  then  we  find  the  largest  j < k with  q3-i  > qk- 1 + qk',  then  we 
remove  qk-i  and  qk  from  the  list,  and  insert  the  sum  qk- 1 + qk  just  after  qj- 1. 
In  the  special  cases  j = 0 or  fc  = n,  the  proofs  show  that  we  should  proceed  as 
if  infinite  weights  q- 1 and  qn+ 1 were  present  at  the  left  and  right.  The  proofs 
also  show  that  any  optimum  tree  T'  that  is  obtained  from  the  new  weights 
(q'o,  ■ ■ ■ , q'n-i)  can  be  rearranged  into  a tree  T that  has  the  original  weights 
(qo, . . . , qn)  in  the  correct  left-to- right  order;  moreover,  each  weight  will  appear 
at  the  same  level  in  both  T and  T'. 

For  example,  Fig.  18  illustrates  the  construction  when  the  weights  qk  are 
the  relative  frequencies  of  the  characters  u,  A,  B,  . . . , Z in  English  text.  The  first 
few  weights  are 


186,  64,  13,  22,  32,  103,  . . . 
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and  we  have  186  > 13,  64  > 22,  13  < 32;  therefore  we  replace  “13, 22”  by  35.  In 
the  new  sequence 

186,  64,  35,  32,  103,  . . . 

we  replace  “35,32”  by  67  and  slide  67  to  the  left  of  64,  obtaining 

186,  67,  64,  103,  .... 

Then  “67, 64”  becomes  131,  and  we  begin  to  examine  the  weights  that  follow  103. 
After  the  27  original  weights  have  been  combined  into  the  single  weight  1000,  the 
history  of  successive  combinations  specifies  a binary  tree  whose  weighted  path 
length  is  the  solution  to  the  original  problem. 

But  the  leaves  of  the  tree  in  Fig.  18  are  not  at  all  in  the  correct  order, 
because  they  get  tangled  up  when  we  slide  qu-i  + Qk  to  the  left  (see  exercise  41). 
Still,  the  proof  of  Lemma  Z guarantees  that  there  is  a tree  whose  leaves  are  in 
the  correct  order  and  on  exactly  the  same  levels  as  in  the  tangled  tree.  This 
untangled  tree,  Fig.  19,  is  therefore  optimum;  it  is  the  binary  tree  output  by  the 
Garsia-Wachs  algorithm. 

Algorithm  G ( Garsia-Wachs  algorithm  for  optimum  binary  trees).  Given  a 
sequence  of  nonnegative  weights  Wo,  w±,  ...,  wn , this  algorithm  constructs  a 
binary  tree  with  n internal  nodes  for  which  wk^k  is  minimum,  where  Ik  is 

the  distance  of  external  node  [T]  from  the  root.  It  uses  an  array  of  2n  + 2 nodes 
whose  addresses  are  X*  for  0 < k < 2n  + 1;  each  node  has  four  fields  called 
WT,  LLINK,  RLINK,  and  LEVEL.  The  leaves  of  the  constructed  tree  will  be  nodes 
X0  . . . X„;  the  internal  nodes  will  be  Xn+1 . . . X2n;  the  root  will  be  X2„;  and  X2n+1 
is  used  as  a temporary  sentinel.  The  algorithm  also  maintains  a working  array 
of  pointers  P0,  Pi,  . . . , Pt,  where  t < n + 1. 

Gl.  [Begin  phase  1.]  Set  WT(Xfc)  <-  wk  and  LLINK(Xfe)  RLINK (X^)  <-  A for 
0 < k <n.  Also  set  P0  <-  X2n+1,  WT(P0)  <-  oo,  Px  <-  X0,  t <-  1,  m 4-  n. 
Then  perform  step  G2  for  r = 1,  2,  . . . , n,  and  go  to  G3. 

G2.  [Absorb  wr]  (At  this  point  we  have  the  basic  condition 

WTCPi-i)  > VT(Pi+1)  for  1 < i < f;  (31) 

in  other  words,  the  weights  in  the  working  array  are  “2-descending.”)  If 
WTfPt-i)  < wr,  set  k 4—  t,  perform  Subroutine  C below,  and  repeat  step  G2. 
Otherwise  set  t «-  t + 1 and  Pt  4-  Xr. 

G3.  [Finish  phase  1.]  While  t > 1,  set  k 4-  t and  perform  Subroutine  C below. 

G4.  [Do  phase  2.]  (Now  Pi  = X2n  is  the  root  of  a binary  tree,  and  WT(Pi)  = 

wo 4 hwn.)  Set  lk  to  the  distance  of  node  Xfc  from  node  Pi,  for  0 < k < n. 

(See  exercise  43.  An  example  is  shown  in  Fig.  18,  where  level  numbers 
appear  at  the  right  of  each  node.) 

G5.  [Do  phase  3.]  By  changing  the  links  of  Xn+i, . . . , X2n,  construct  a new  binary 
tree  having  the  same  level  numbers  lk,  but  with  the  leaf  nodes  in  symmetric 
order  Xq,  . . . , Xn.  (See  exercise  44;  an  example  appears  in  Fig.  19.)  | 
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Subroutine  C ( Combination ).  This  recursive  subroutine  is  the  heart  of  the 
Garsia-Wachs  algorithm.  It  combines  two  weights,  shifts  them  left  as  appropri- 
ate, and  maintains  the  2-descending  condition  (31).  Variables  j and  w are  local, 
but  variables  k,  m,  and  t are  global. 

Cl.  [Create  a new  node.]  (At  this  point  we  have  k > 2.)  Set  m 4—  m + 1, 
LLINK(Xm)  <-Pfc_1,RLINK(Xm)  Pfc,  WT(Xm)  4-  w 4-  WT(Pfc_i)  +WT(Pfc) . 
C2.  [Shift  the  following  nodes  left.]  Set  t 4-  t - 1,  then  Pj  4-  PJ+1  for  k < j <t. 

C3.  [Shift  the  preceding  nodes  right.]  Set  j 4-  k - 2;  then  while  WT(Pj)  < w set 
Pj+i  <-  P j and  j 4-  j - 1. 

C4.  [Insert  the  new  node.]  Set  Pj+1  4-  Xm. 

C5.  [Done?]  If  j = 0 or  WT(Pj_j)  > w,  exit  the  subroutine. 

C6.  [Restore  (31).]  Set  k 4—  j,  j 4-  t—j,  and  call  Subroutine  C recursively.  Then 
reset  j 4-  t - j (note  that  t may  have  changed!)  and  return  to  step  C5.  | 

Subroutine  C might  need  D(n)  steps  to  create  and  insert  a new  node,  because 
it  uses  sequential  memory  instead  of  linked  lists.  Therefore  the  total  running  time 
of  Algorithm  G might  be  fi(n2).  But  more  elaborate  data  structures  can  be  used 
to  guarantee  that  phase  1 will  require  at  most  0(n  log n)  steps  (see  exercise  45). 
Phases  2 and  3 need  only  O(n)  steps. 

Kleitman  and  Saks  [SIAM  J.  Algeb.  Discr.  Methods  2 (1981),  142-146] 
proved  that  the  optimum  weighted  path  length  never  exceeds  the  value  of  the 
optimum  weighted  path  length  that  occurs  when  the  q's  have  been  rearranged 
in  “sawtooth  order”: 

Qo  < 92  < 94  < • • • < <72[n/2J  < Q2[n/2]-l  < ' ’ ' < 93  < <?1  • (32) 

(This  is  the  inverse  of  the  organ-pipe  order  discussed  in  exercise  6.1-18.)  In 
the  latter  case  the  Garsia-Wachs  algorithm  essentially  reduces  to  Huffman’s 
algorithm  on  the  weights  5o  + Qi  > <72  + Q3,  ■ ■ ■ , because  the  weights  in  the  working 
array  will  actually  be  nonincreasing  (not  merely  “2-descending”  as  in  (31)). 
Therefore  we  can  improve  the  upper  bound  of  Theorem  M without  knowing 
the  order  of  the  weights. 

The  optimum  binary  tree  in  Fig.  19  has  an  important  application  to  coding 
theory  as  well  as  to  searching:  Using  0 to  stand  for  a left  branch  in  the  tree  and 
1 to  stand  for  a right  branch,  we  obtain  the  following  variable-length  codewords: 


u 

00 

I 

1000 

R 

11001 

A 

0100 

J 

1001000 

s 

1101 

B 

010100 

K 

1001001 

T 

1110 

C 

010101 

L 

100101 

u 

111100 

D 

01011 

M 

10011 

V 

111101 

E 

0110 

N 

1010 

w 

111110 

F 

011100 

0 

1011 

X 

11111100 

G 

011101 

P 

110000 

Y 

11111101 

H 

01111 

Q 

110001 

Z 

1111111 
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Thus  a message  like  “RIGHT  ON”  would  be  encoded  by  the  string 
1100110000111010111111100010111010. 

Decoding  from  left  to  right  is  easy,  in  spite  of  the  variable  length  of  the  codewords, 
because  the  tree  structure  tells  us  when  one  codeword  ends  and  another  begins. 
This  method  of  coding  preserves  the  alphabetical  order  of  messages,  and  it  uses 
an  average  of  about  4.2  bits  per  letter.  Thus  the  code  could  be  used  to  compress 
data  files,  without  destroying  lexicographic  order  of  alphabetic  information.  (The 
figure  of  4.2  bits  per  letter  is  minimum  over  all  binary  tree  codes,  although  it 
could  be  reduced  to  4.1  bits  per  letter  if  we  disregarded  the  alphabetic  ordering 
constraint.  A further  reduction,  preserving  alphabetic  order,  could  be  achieved 
if  pairs  of  letters  instead  of  single  letters  were  encoded.) 

History  and  bibliography.  The  tree  search  methods  of  this  section  were 
discovered  independently  by  several  people  during  the  1950s.  In  an  unpublished 
memorandum  dated  August  1952,  A.  I.  Dumey  described  a primitive  form  of 
tree  insertion  in  the  following  way: 

Consider  a drum  with  2n  item  storages  in  it,  each  having  a binary 

address. 

Follow  this  program: 

1.  Read  in  the  first  item  and  store  it  in  address  2n~1,  i.e.,  at  the 
halfway  storage  place. 

2.  Read  in  the  next  item.  Compare  it  with  the  first. 

3.  If  it  is  larger,  put  it  in  address  2n_1  + 2n~2.  If  it  is  smaller,  put  it 
at  2"~2.  ... 

Another  early  form  of  tree  insertion  was  introduced  by  D.  J.  Wheeler,  who 
actually  allowed  multiway  branching  similar  to  what  we  shall  discuss  in  Section 
6.2.4;  and  a binary  tree  insertion  technique  was  devised  by  C.  M.  Berners-Lee 
[see  Comp.  J.  2 (1959),  5], 

The  first  published  descriptions  of  tree  insertion  were  by  P.  F.  Windley 
[Comp.  J.  3 (1960),  84-88],  A.  D.  Booth  and  A.  J.  T.  Colin  [Information  and 
Control  3 (1960),  327-334],  and  Thomas  N.  Hibbard  [JACM  9 (1962),  13-28]. 
Each  of  these  authors  seems  to  have  developed  the  method  independently  of 
the  others,  and  each  paper  derived  the  average  number  of  comparisons  (6)  in 
a different  way.  The  individual  authors  also  went  on  to  treat  different  aspects 
of  the  algorithm:  Windley  gave  a detailed  discussion  of  tree  insertion  sorting; 
Booth  and  Colin  discussed  the  effect  of  preconditioning  by  making  the  first  2"  — 1 
elements  form  a perfectly  balanced  tree  (see  exercise  4);  Hibbard  introduced  the 
idea  of  deletion  and  showed  the  connection  between  the  analysis  of  tree  insertion 
and  the  analysis  of  quicksort. 

The  idea  of  optimum  binary  search  trees  was  first  developed  for  the  special 
case  pi  = • • • — pn  = 0,  in  the  context  of  alphabetic  binary  encodings  like 
(33)-  A very  interesting  paper  by  E.  N.  Gilbert  and  E.  F.  Moore  [ Bell  System 
Tech.  J.  38  (1959),  933-968]  discussed  this  problem  and  its  relation  to  other 
coding  problems.  Gilbert  and  Moore  proved  Theorem  M in  the  special  case 
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P = 0,  and  observed  that  an  optimum  tree  could  be  constructed  in  0(n3)  steps, 
using  a method  like  Algorithm  K but  without  making  use  of  the  monotonicity 
relation  (17).  K.  E.  Iverson  [A  Programming  Language  (Wiley,  1962),  142-144] 
independently  considered  the  other  case,  when  all  the  q' s are  zero.  He  suggested 
that  an  optimum  tree  would  be  obtained  if  the  root  is  chosen  so  as  to  equalize  the 
left  and  right  subtree  probabilities  as  much  as  possible;  unfortunately  we  have 
seen  that  this  idea  doesn’t  work.  D.  E.  Knuth  [Acta  Informatica  1 (1971),  14-25, 
270]  subsequently  considered  the  case  of  general  p and  q weights  and  proved  that 
the  algorithm  could  be  reduced  to  0(n2)  steps;  he  also  presented  an  example 
from  a compiler  application,  where  the  keys  in  the  tree  are  “reserved  words”  in 
an  ALGOL-like  language.  T.  C.  Hu  had  been  studying  his  own  algorithm  for  the 
case  pj  = 0 for  several  years;  a rigorous  proof  of  the  validity  of  that  algorithm 
was  difficult  to  find  because  of  the  complexity  of  the  problem,  but  he  eventually 
obtained  a proof  jointly  with  A.  C.  Tucker  [SIAM  J.  Applied  Math.  21  (1971), 
514-532].  Simplifications  leading  to  Algorithm  G were  found  several  years  later 
by  A.  M.  Garsia  and  M.  L.  Wachs,  SICOMP  6 (1977),  622-642,  although  their 
proof  was  still  rather  complicated.  Lemmas  W,  X,  Y,  and  Z above  are  due  to 
J.  H.  Kingston,  J.  Algorithms  9 (1988),  129-136.  Further  properties  have  been 
found  by  M.  Karpinski,  L.  L.  Larmore,  and  W.  Rytter,  Theoretical  Comp.  Sci. 
180  (1997),  309-324.  See  also  the  paper  by  Hu,  Kleitman,  and  Tamaki,  SIAM 
J.  Applied  Math.  37  (1979),  246-256,  for  an  elementary  proof  of  the  Hu-Tucker 
algorithm  and  some  generalizations  to  other  cost  functions. 

Theorem  B is  due  to  Paul  J.  Bayer,  report  MIT/LCS/TM-69  (Mass.  Inst, 
of  Tech.,  1975),  who  also  proved  a slightly  weaker  form  of  Theorem  M.  The 
stronger  form  above  is  due  to  K.  Mehlhorn,  SICOMP  6 (1977),  235-239. 

EXERCISES 

1.  [15]  Algorithm  T has  been  stated  only  for  nonempty  trees.  What  changes  should 
be  made  so  that  it  works  properly  for  the  empty  tree  too? 

2.  [20]  Modify  Algorithm  T so  that  it  works  with  right-threaded  trees.  (See  Section 
2.3.1;  symmetric  traversal  is  easier  in  such  trees.) 

► 3.  [20]  In  Section  6.1  we  found  that  a slight  change  to  the  sequential  search  Algo- 
rithm 6.1S  made  it  faster  (Algorithm  6.1Q).  Can  a similar  trick  be  used  to  speed  up 
Algorithm  T? 

4.  [M24 ] (A.  D.  Booth  and  A.  J.  T.  Colin.)  Given  N keys  in  random  order,  suppose 
that  we  use  the  first  2n  — 1 to  construct  a perfectly  balanced  tree,  placing  2k  keys  on 
level  k for  0 < k < n;  then  we  use  Algorithm  T to  insert  the  remaining  keys.  What  is 
the  average  number  of  comparisons  in  a successful  search?  [Hint:  Modify  Eq.  (2).] 

► 5.  [M25]  There  are  11!  = 39,916,800  different  orders  in  which  the  names  CAPRICORN, 
AQUARIUS,  etc.  could  have  been  inserted  into  a binary  search  tree. 

a)  How  many  of  these  arrangements  will  produce  Fig.  10? 

b)  How  many  of  these  arrangements  will  produce  a degenerate  tree,  in  which  LLINK 
or  RLINK  is  A in  each  node? 

6.  [M26]  Let  Pnk  be  the  number  of  permutations  <21  a2  . . . arl  of  {1, 2, . . . , n}  such 
that,  if  Algorithm  T is  used  to  insert  ai,  <22 , . . . , an  successively  into  an  initially  empty 
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tree,  exactly  k comparisons  are  made  when  a„  is  inserted.  (In  this  problem,  we  will 
ignore  the  comparisons  made  when  oi,. . . ,a„_i  were  inserted.  In  the  notation  of  the 
text,  we  have  C'n_i  = (Ylk  kP’nk)/n\,  since  this  is  the  average  number  of  comparisons 
made  in  an  unsuccessful  search  of  a tree  containing  n — 1 elements.) 

a)  Prove  that  P(„+ i)jt  = 2 P„(k-i)  + (n  — 1 )Pnk-  [Hint:  Consider  whether  or  not  an+ i 
falls  below  an  in  the  tree.] 

b)  Find  a simple  formula  for  the  generating  function  G„(z)  = ^2kPnkZk,  and  use 
your  formula  to  express  Pnk  in  terms  of  Stirling  numbers. 

c)  What  is  the  variance  of  this  distribution? 

7.  [M25]  (S.  R.  Arora  and  W.  T.  Dent.)  After  n elements  have  been  inserted  into 
an  initially  empty  tree,  in  random  order,  what  is  the  average  number  of  comparisons 
needed  by  Algorithm  T to  find  the  mth  largest  element,  given  the  key  of  that  element? 

8.  [ M38 ] Let  p(n,  k)  be  the  probability  that  k is  the  total  internal  path  length  of  a 
tree  built  by  Algorithm  T from  n randomly  ordered  keys.  (The  internal  path  length  is 
the  number  of  comparisons  made  by  tree  insertion  sorting  as  the  tree  is  being  built.) 

a)  Find  a recurrence  relation  that  defines  the  corresponding  generating  function. 

b)  Compute  the  variance  of  this  distribution.  [Several  of  the  exercises  in  Section  1.2.7 
may  be  helpful  here.] 

9.  [-{1]  We  have  proved  that  tree  search  and  insertion  requires  only  about  2 In  N 
comparisons  when  the  keys  are  inserted  in  random  order;  but  in  practice,  the  order 
may  not  be  random.  Make  empirical  studies  to  see  how  suitable  tree  insertion  really  is 
for  symbol  tables  within  a compiler  and/or  assembler.  Do  the  identifiers  used  in  typical 
large  programs  lead  to  fairly  well-balanced  binary  search  trees? 

► 10.  [22]  (R.  W.  Floyd.)  Perhaps  we  are  not  interested  in  the  sorting  property  of 
Algorithm  T,  but  we  expect  that  the  input  will  come  in  nonrandom  order.  Devise  a 
way  to  keep  tree  search  efficient,  by  making  the  input  “appear  to  be”  in  random  order. 

11.  [20]  What  is  the  maximum  number  of  times  the  assignment  S <—  LLINK(R)  might 
be  performed  in  step  D3,  when  deleting  a node  from  a tree  of  size  N ? 

12.  [M22]  When  making  a random  deletion  from  a random  tree  of  N items,  how  often 
does  step  D1  go  to  D4,  on  the  average?  (See  the  proof  of  Theorem  H.) 

► 13.  [M23]  If  the  root  of  a random  tree  is  deleted  by  Algorithm  D,  is  the  resulting  tree 
still  random? 

► 14.  [22]  Prove  that  the  path  length  of  the  tree  produced  by  Algorithm  D with  step 
D1.5  added  is  never  more  than  the  path  length  of  the  tree  produced  without  that  step. 
Find  a case  where  step  D1.5  actually  decreases  the  path  length. 

15.  [23]  Let  ai  02  <13  <14  be  a permutation  of  {1, 2, 3, 4},  and  let  j = 1,  2,  or  3.  Take  the 
one-element  tree  with  key  ai  and  insert  a2,  <23  using  Algorithm  T;  then  delete  a3  using 
Algorithm  D;  then  insert  a 4 using  Algorithm  T.  How  many  of  the  4!  x 3 possibilities 
produce  trees  of  shape  I,  II,  III,  IV,  V,  respectively,  in  (13)? 

► 16.  [25]  Is  the  deletion  operation  commutative ? That  is,  if  Algorithm  D is  used  to 
delete  X and  then  Y,  is  the  resulting  tree  the  same  as  if  Algorithm  D is  used  to  delete 
Y and  then  XI 

17.  [25]  Show  that  if  the  roles  of  left  and  right  are  completely  reversed  in  Algorithm  D, 
it  is  easy  to  extend  the  algorithm  so  that  it  deletes  a given  node  from  a right-threaded 
tree,  preserving  the  necessary  threads.  (See  exercise  2.) 

18.  [M21]  Show  that  Zipf’s  law  yields  (12). 
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19.  [ M23 ] What  is  the  approximate  average  number  of  comparisons,  (n),  when  the 
input  probabilities  satisfy  the  80-20  law  defined  in  Eq.  6.1-(n)? 

20.  [ M20 ] Suppose  we  have  inserted  keys  into  a tree  in  order  of  decreasing  frequency 
pi  > P2  > •••  > pn-  Can  this  tree  be  substantially  worse  than  the  optimum  search 
tree? 

21.  [M20]  If  p,  q,  r are  probabilities  chosen  at  random,  subject  to  the  condition  that 
p + q + r=  1,  what  are  the  probabilities  that  trees  I,  II,  III,  IV,  V of  (13)  are  optimal, 
respectively?  (Consider  the  relative  areas  of  the  regions  in  Fig.  14.) 

22.  [M20]  Prove  that  r[i,j- 1]  is  never  greater  than  r[i+l,j]  when  step  K4  of  Algo- 
rithm K is  performed. 

► 23.  [M23]  Find  an  optimum  binary  search  tree  for  the  case  N — 40,  with  weights 
Pi  = 9,  P2  = P3  = • • • = P40  = 1,  go  = <?i  = • • • = 940  = 0.  (Don’t  use  a computer.) 

24.  [M25]  Given  that  pn  = qn  = 0 and  that  the  other  weights  are  nonnegative,  prove 
that  an  optimum  tree  for  (p  1, . . . ,pn;  qo, . . . , q„)  may  be  obtained  by  replacing 


in  any  optimum  tree  for  (pi, . . . ,p„-i;  q0, ... , qn- 1). 

25.  [ M20 ] Let  A and  B be  nonempty  sets  of  real  numbers,  and  define  A < B if  the 
following  property  holds: 

(a  £ A,  b £ B,  and  b < a)  implies  (a  £ B and  b £ A). 

a)  Prove  that  this  relation  is  transitive  on  nonempty  sets. 

b)  Prove  or  disprove:  A < B if  and  only  if  A < A U B < B. 

26.  [M22]  Let  (pi , • . . , pn;  go, , q„)  be  nonnegative  weights,  where  pn  + qn  = x. 
Prove  that  as  x varies  from  0 to  00,  while  (px, . . . ,p„-i;  go, ... , gn-i)  are  held  constant, 
the  cost  c(0,  n)  of  an  optimum  binary  search  tree  is  a concave,  continuous,  piecewise 
linear  function  of  x with  integer  slopes.  In  other  words,  prove  that  there  exist  positive 
integers  lo  > li  > ■ • • > lm  and  real  constants  0 — xo  < xi  < • • • < xm  < zm+1  = 00 
and  y0  < yi  ■ ■ ■ < ym  such  that  c(0, n)  = yh  + lhx  when  xh  < x < xh+1,  for  0 < h < m. 

27.  [ MSS]  The  object  of  this  exercise  is  to  prove  that  the  sets  of  roots  R(i,j)  of 
optimum  binary  search  trees  satisfy 

R(hj~  1)  < R(hj)  < jR(i+l,  j),  for  j - i > 2, 

in  terms  of  the  relation  defined  in  exercise  25,  when  the  weights  (pi, . . . ,p„;  q0,...,qn) 
are  nonnegative.  The  proof  is  by  induction  on  j-i;  our  task  is  to  prove  that  R(  0,  n-1)  < 
R(0,n),  assuming  that  n > 2 and  that  the  stated  relation  holds  for  j - i < n.  [By 
left-right  symmetry  it  follows  that  R(0,  n)  < R(l,  n).] 

a)  Prove  that  R(0,n  — 1)  < R(0,n)  if  pn  = qn  = 0.  (See  exercise  24.) 

b)  Let  pn  + qn  = x.  In  the  notation  of  exercise  26,  let  Rh  be  the  set  fl(0,  n)  of 
optimum  roots  when  Xh  < x < Xh+ 1,  and  let  R'h  be  the  set  of  optimum  roots  when 
x — Xh-  Prove  that 


R!o  < R0  < R[  < Ri  < ■ ■ ■ < R'm  < Rm. 
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Hence  by  part  (a)  and  exercise  25  we  have  R( 0,  n-1)  < R(0.  n)  for  all  x.  [Hint: 
Consider  the  case  x = x/,,  and  assume  that  both  the  trees 


t(0,r  — 1)  t(r,n)  t(0,s  — 1)  t(s,n) 

I~n~l  at  level  l [n~]  at  level  V 

are  optimum,  with  s < r and  l > l1.  Use  the  induction  hypothesis  to  prove  that 
there  is  an  optimum  tree  with  root  (r)  such  that  [n]  is  at  level  l',  and  an  optimum 
tree  with  root  (s)  such  that  [n]  is  at  level  /.] 

28.  [24]  Use  some  macro  language  to  define  an  “optimum  binary  search”  macro, 
whose  parameter  is  a nested  specification  of  an  optimum  binary  tree. 

29.  [40]  What  is  the  worst  possible  binary  search  tree  for  the  31  most  common  English 
words,  using  the  frequency  data  of  Fig.  12? 

30.  [M3 4}  Prove  that  the  costs  of  optimum  binary  search  trees  satisfy  the  “quadrangle 
inequality”  c(i,j)  - c(i,j- 1)  > c(i+l,  j)  - c(i+l,j-l)  when  j > i + 2. 

31.  [M35]  (K.  C.  Tan.)  Prove  that,  among  all  possible  sets  of  probabilities  (pi, ...  ,pn; 
go , ,qn)  with  pi  + ■■■+  pn  + qo  + •■■  + qn  = 1,  the  most  expensive  minimum-cost 
tree  occurs  when  p,  = 0 for  all  i,  q3  — 0 for  all  even  j,  and  q3  = l/[n/2]  for  all  odd  j. 

► 32.  [ M25 ] Let  n + 1 = 2rrl  + k,  where  0 < k < 2m.  There  are  exactly  (2k  ) binary 
trees  in  which  all  external  nodes  appear  on  levels  m and  m + 1.  Show  that,  among  all 
these  trees,  we  obtain  one  with  the  minimum  cost  for  the  weights  (pi , . . . , pn ; qo , . . . , qn ) 
if  we  apply  Algorithm  K to  the  weights  (pi, . . . ,pn;M+q0,  ■■■,  M+qn)  for  sufficiently 
large  M. 

33.  [M4I]  In  order  to  find  the  binary  search  tree  that  minimizes  the  running  time  of 
Program  T,  we  should  minimize  the  quantity  7 C + Cl  instead  of  simply  minimizing 
the  number  of  comparisons  C.  Develop  an  algorithm  that  finds  optimum  binary  search 
trees  when  different  costs  are  associated  with  left  and  right  branches  in  the  tree. 
(Incidentally,  when  the  right  cost  is  twice  the  left  cost,  and  the  node  frequencies  are  all 
equal,  the  Fibonacci  trees  turn  out  to  be  optimum;  see  L.  E.  Stanfel,  JACM  17  (1970), 
508-517.  On  machines  that  cannot  make  three-way  comparisons  at  once,  a program 
for  Algorithm  T will  have  to  make  two  comparisons  in  step  T2,  one  for  equality  and 
one  for  less-than;  B.  Sheil  and  V.  R.  Pratt  have  observed  that  these  comparisons  need 
not  involve  the  same  key,  and  it  may  well  be  best  to  have  a binary  tree  whose  internal 
nodes  specify  either  an  equality  test  or  a less-than  test  but  not  both.  This  situation 
would  be  interesting  to  explore  as  an  alternative  to  the  stated  problem.) 

34.  [HM21  ] Show  that  the  asymptotic  value  of  the  multinomial  coefficient 

( N ) 

\piN,  p2N,  . . . , pnN  / 

as  N — > 00  is  related  to  the  entropy  J/(pi,p2, . . . ,p„). 

35.  [HM22]  Complete  the  proof  of  Theorem  B by  establishing  the  inequality  (24). 

► 36.  [HM25]  (Claude  Shannon.)  Let  X and  Y be  random  variables  with  finite  ranges 
{*1,.. ,,xm}  and  {yi,...,j/„},  and  let  p,  = Pr(X  = x<),  qj  = Pr(y  = y3),  m = 
Pr(A  = Xi  and  Y = y3).  Let  H(X)  = H(pi,. . . ,pm)  and  H(Y)  = H(qi,  ...,q„)  be  the 
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respective  entropies  of  the  variables  singly,  and  let  H(XY ) = H( m, . . . , rmn)  be  the 
entropy  of  their  joint  distribution.  Prove  that 

H{ X)  < H(XY)  < H(X)  + H(Y). 

[Hint:  If  / is  any  concave  function,  we  have  E f(X)  < /(EX).] 

37.  [HM26]  (P.  J.  Bayer,  1975.)  Suppose  (Pi, . . . , Pn)  is  a random  probability  distri- 
bution, namely  a random  point  in  the  ( n — l)-dimensional  simplex  defined  by  Pk  > 0 
for  1 < k < n and  Pi  + ■■■  + Pn  = 1.  (Equivalently,  (Pi, . . . , Pn)  is  a set  of  random 
spacings,  in  the  sense  of  exercise  3.3.2-26.)  What  is  the  expected  value  of  the  entropy 
H(Pi,  , Pn)? 

38.  [ M20 ] Explain  why  Theorem  M holds  in  general,  although  we  have  only  proved 
it  in  the  case  so  < Si  < S2  < • • • < s„. 

► 39.  [M25]  Let  Wi,  . . . , wn  be  nonnegative  weights  with  wi  + • • • + wn  = 1.  Prove 
that  the  weighted  path  length  of  the  Huffman  tree  constructed  in  Section  2. 3.4. 5 is  less 
than  H(wi, . . . , w„)  + 1.  Hint:  See  the  proof  of  Theorem  M. 

40.  [ M26 ] Complete  the  proof  of  Lemma  Z. 

41.  [21]  Figure  18  shows  the  construction  of  a tangled  binary  tree.  List  its  leaves  in 
left-to-right  order. 

42.  [23]  Explain  why  Subroutine  C preserves  the  2-descending  condition  (31). 

43.  [20]  Explain  how  to  implement  phase  2 of  the  Garsia-Wachs  algorithm  efficiently. 

► 44.  [25]  Explain  how  to  implement  phase  3 of  the  Garsia-Wachs  algorithm  efficiently: 
Construct  a binary  tree,  given  the  levels  l0,  li,  ln  of  its  leaves  in  symmetric  order. 

► 45.  [30]  Explain  how  to  implement  Subroutine  C so  that  the  total  running  time  of 
the  Garsia-Wachs  algorithm  is  at  most  0(n  log  n). 

46.  [M30]  (C.  K.  Wong  and  Shi-Kuo  Chang.)  Consider  a scheme  whereby  a binary 
search  tree  is  constructed  by  Algorithm  T,  except  that  whenever  the  number  of  nodes 
reaches  a number  of  the  form  2n  — 1 the  tree  is  reorganized  into  a perfectly  balanced 
uniform  tree,  with  2k  nodes  on  level  k for  0 < k < n.  Prove  that  the  total  number  of 
comparisons  made  while  constructing  such  a tree  is  N lg  N + 0(N)  on  the  average.  (It  is 
not  difficult  to  show  that  the  amount  of  time  needed  for  the  reorganizations  is  0(N).) 

47.  [M40]  Generalize  Theorems  B and  M from  binary  trees  to  Gary  trees.  If  possible, 
also  allow  the  branching  costs  to  be  nonuniform  as  in  exercise  33. 

48.  [ M47 ] Carry  out  a rigorous  analysis  of  the  steady  state  of  a binary  search  tree 
subjected  to  random  insertions  and  deletions. 

49.  [HM42]  Analyze  the  average  height  of  a random  binary  search  tree. 

6.2.3.  Balanced  Trees 

The  tree  insertion  algorithm  we  have  just  learned  will  produce  good  search  trees, 
when  the  input  data  is  random,  but  there  is  still  the  annoying  possibility  that 
a degenerate  tree  will  occur.  Perhaps  we  could  devise  an  algorithm  that  keeps 
the  tree  optimum  at  all  times;  but  unfortunately  that  seems  to  be  very  difficult. 
Another  idea  is  to  keep  track  of  the  total  path  length,  and  to  reorganize  the  tree 
completely  whenever  its  path  length  exceeds  5 N lg  N,  say.  But  such  an  approach 
might  require  about  i/N/2  reorganizations  as  the  tree  is  being  built. 
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A very  pretty  solution  to  the  problem  of  maintaining  a good  search  tree  was 
discovered  in  1962  by  two  Russian  mathematicians,  G.  M.  Adelson- Velsky  and  E. 
M.  Landis  [Doklady  Akademii  Nauk  SSSR  146  (1962),  263-266;  English  trans- 
lation in  Soviet  Math.  Doklady  3 (1962),  1259-1263].  Their  method  requires 
only  two  extra  bits  per  node,  and  it  never  uses  more  than  O(loglV)  operations 
to  search  the  tree  or  to  insert  an  item.  In  fact,  we  shall  see  that  their  approach 
also  leads  to  a general  technique  that  is  good  for  representing  arbitrary  linear 
lists  of  length  N,  so  that  each  of  the  following  operations  can  be  done  in  only 
0(log  N)  units  of  time: 

i)  Find  an  item  having  a given  key. 

ii)  Find  the  fcth  item,  given  k. 

iii)  Insert  an  item  at  a specified  place. 

iv)  Delete  a specified  item. 

If  we  use  sequential  allocation  for  linear  lists,  operations  (i)  and  (ii)  are  efficient 
but  operations  (iii)  and  (iv)  take  order  N steps;  on  the  other  hand,  if  we  use 
linked  allocation,  operations  (iii)  and  (iv)  are  efficient  but  operations  (i)  and  (ii) 
take  order  N steps.  A tree  representation  of  linear  lists  can  do  all  four  operations 
in  O(logV)  steps.  And  it  is  also  possible  to  do  other  standard  operations 
with  comparable  efficiency,  so  that,  for  example,  we  can  concatenate  a list  of 
M elements  with  a list  of  N elements  in  0(log(M  + N ))  steps. 

The  method  for  achieving  all  this  involves  what  we  shall  call  balanced  trees. 
(Many  authors  also  call  them  A VL  trees,  where  the  AV  stands  for  Adelson- Velsky 
and  the  L stands  for  Landis.)  The  preceding  paragraph  is  an  advertisement  for 
balanced  trees,  which  makes  them  sound  like  a universal  panacea  that  makes  all 
other  forms  of  data  representation  obsolete;  but  of  course  we  ought  to  have  a 
balanced  attitude  about  balanced  trees!  In  applications  that  do  not  involve  all 
four  of  the  operations  above,  we  may  be  able  to  get  by  with  substantially  less 
overhead  and  simpler  programming.  Furthermore,  there  is  no  advantage  to  bal- 
anced trees  unless  N is  reasonably  large;  thus  if  we  have  an  efficient  method  that 
takes  64  lg  N units  of  time  and  an  inefficient  method  that  takes  2 N units  of  time, 
we  should  use  the  inefficient  method  unless  N is  greater  than  256.  On  the  other 
hand,  N shouldn’t  be  too  large,  either;  balanced  trees  are  appropriate  chiefly  for 
internal  storage  of  data,  and  we  shall  study  better  methods  for  external  direct- 
access  files  in  Section  6.2.4.  Since  internal  memories  seem  to  be  getting  larger  and 
larger  as  time  goes  by,  balanced  trees  are  becoming  more  and  more  important. 

The  height  of  a tree  is  defined  to  be  its  maximum  level,  the  length  of  the 
longest  path  from  the  root  to  an  external  node.  A binary  tree  is  called  balanced 
if  the  height  of  the  left  subtree  of  every  node  never  differs  by  more  than  ±1  from 
the  height  of  its  right  subtree.  Figure  20  shows  a balanced  tree  with  17  internal 
nodes  and  height  5;  the  balance  factor  within  each  node  is  shown  as  +,  • , or  — 
according  as  the  right  subtree  height  minus  the  left  subtree  height  is  +1,  0,  or  —1. 
The  Fibonacci  tree  in  Fig.  8 (Section  6.2.1)  is  another  balanced  binary  tree  of 
height  5,  having  only  12  internal  nodes;  most  of  the  balance  factors  in  that  tree 
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Fig.  20.  A balanced  binary  tree. 

are  —1.  The  zodiac  tree  in  Fig.  10  (Section  6.2.2)  is  not  balanced,  because  the 
height  restriction  on  subtrees  fails  at  both  the  AQUARIUS  and  GEMINI  nodes. 

This  definition  of  balance  represents  a compromise  between  optimum  binary 
trees  (with  all  external  nodes  required  to  be  on  two  adjacent  levels)  and  arbitrary 
binary  trees  (unrestricted).  It  is  therefore  natural  to  ask  how  far  from  optimum 
a balanced  tree  can  be.  The  answer  is  that  its  search  paths  will  never  be  more 
than  45  percent  longer  than  the  optimum: 

Theorem  A (Adelson- Velsky  and  Landis).  The  height  of  a balanced  tree  with 
N internal  nodes  always  lies  between  lg (N  + 1)  and  1.4405  lg(V  + 2)  - 0.3277. 

Proof.  A binary  tree  of  height  h obviously  cannot  have  more  than  2h  external 
nodes;  so  N + 1 < 2h,  that  is,  h > [lg(A  + 1)]  in  any  binary  tree. 

In  order  to  find  the  maximum  value  of  h,  let  us  turn  the  problem  around  and 
ask  for  the  minimum  number  of  nodes  possible  in  a balanced  tree  of  height  h. 
Let  Th  be  such  a tree  with  fewest  possible  nodes;  then  one  of  the  subtrees  of 
the  root,  say  the  left  subtree,  has  height  h—  1,  and  the  other  subtree  has  height 
h — 1 or  h — 2.  Since  we  want  Th  to  have  the  minimum  number  of  nodes,  we  may 
assume  that  the  left  subtree  of  the  root  is  Th-i,  and  that  the  right  subtree  is 
Th- 2-  This  argument  shows  that  the  Fibonacci  tree  of  order  h + l has  the  fewest 
possible  nodes  among  all  possible  balanced  trees  of  height  h.  (See  the  definition 
of  Fibonacci  trees  in  Section  6.2.1.)  Thus 

N>Fh+ 2 - 1 > ct>h+2/Vb  - 2, 

and  the  stated  result  follows  as  in  the  corollary  to  Theorem  4.5.3F.  | 

The  proof  of  this  theorem  shows  that  a search  in  a balanced  tree  will  require 
more  than  25  comparisons  only  if  the  tree  contains  at  least  F28  — 1 = 317,810 
nodes. 

Consider  now  what  happens  when  a new  node  is  inserted  into  a balanced 
tree  using  tree  insertion  (Algorithm  6.2.2T).  In  Fig.  20,  the  tree  will  still  be 
balanced  if  the  new  node  takes  the  place  of  [T] , [5] , [IS] , [7],  [10] , or  [13] , but 
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some  adjustment  will  be  needed  if  the  new  node  falls  elsewhere.  The  problem 
arises  when  we  have  a node  with  a balance  factor  of  +1  whose  right  subtree 
got  higher  after  the  insertion;  or,  dually,  if  the  balance  factor  is  —1  and  the  left 
subtree  got  higher.  It  is  not  difficult  to  see  that  trouble  arises  only  in  two  cases: 


(Two  other  essentially  identical  cases  occur  if  we  reflect  these  diagrams,  in- 
terchanging left  and  right.)  In  these  diagrams  the  large  rectangles  a,  (3,  7,  S 
represent  subtrees  having  the  respective  heights  shown.  Case  1 occurs  when  a 
new  element  has  just  increased  the  height  of  node  B' s right  subtree  from  h to 
h + 1,  and  Case  2 occurs  when  the  new  element  has  increased  the  height  of  £?’ s 
left  subtree.  In  the  second  case,  we  have  either  h = 0 (so  that  X itself  was  the 
new  node),  or  else  node  X has  two  subtrees  of  respective  heights  (h  — l,h)  or 

(h,h- 1). 

Simple  transformations  will  restore  balance  in  both  of  these  cases,  while 
preserving  the  symmetric  order  of  the  tree  nodes: 


Case 


A 


h+\ 


Y 


(2) 


In  Case  1 we  simply  “rotate”  the  tree  to  the  left,  attaching  f3  to  A instead  of  B. 
This  transformation  is  like  applying  the  associative  law  to  an  algebraic  formula, 
replacing  ot(j3~i)  by  (0/3)7.  In  Case  2 we  use  a double  rotation,  first  rotating 
( X , B)  right,  then  (A,  X)  left.  In  both  cases  only  a few  links  of  the  tree  need  to 
be  changed.  Furthermore,  the  new  trees  have  height  h + 2,  which  is  exactly  the 
height  that  was  present  before  the  insertion;  hence  the  rest  of  the  tree  (if  any) 
that  was  originally  above  node  A always  remains  balanced. 

For  example,  if  we  insert  a new  node  into  position  [17]  of  Fig.  20  we  obtain 
the  balanced  tree  shown  in  Fig.  21,  after  a single  rotation  (Case  1).  Notice  that 
several  of  the  balance  factors  have  changed. 

The  details  of  this  insertion  procedure  can  be  worked  out  in  several  ways. 
At  first  glance  an  auxiliary  stack  seems  to  be  necessary,  in  order  to  keep  track 
of  which  nodes  will  be  affected,  but  the  following  algorithm  gains  some  speed  by 
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Fig.  21.  The  tree  of  Fig.  20,  rebalanced  after  a new  key  R has  been  inserted. 


exploiting  the  fact  that  the  balance  factor  of  node  B in  (l)  was  zero  before  the 
insertion. 

Algorithm  A ( Balanced  tree  search  and  insertion).  Given  a table  of  records 
that  form  a balanced  binary  tree  as  described  above,  this  algorithm  searches  for 
a given  argument  K.  If  K is  not  in  the  table,  a new  node  containing  K is  inserted 
into  the  tree  in  the  appropriate  place  and  the  tree  is  rebalanced  if  necessary. 

The  nodes  of  the  tree  are  assumed  to  contain  KEY,  LLINK,  and  RLINK  fields 
as  in  Algorithm  6.2.2T.  We  also  have  a new  field 

B(P)  = balance  factor  of  NODE(P), 

the  height  of  the  right  subtree  minus  the  height  of  the  left  subtree;  this  field 
always  contains  either  +1,  0,  or  —1.  A special  header  node  also  appears  at  the 
top  of  the  tree,  in  location  HEAD;  the  value  of  RLINK  (HEAD)  is  a pointer  to  the 
root  of  the  tree,  and  LLINK  (HEAD)  is  used  to  keep  track  of  the  overall  height  of 
the  tree.  (Knowledge  of  the  height  is  not  really  necessary  for  this  algorithm,  but 
it  is  useful  in  the  concatenation  procedure  discussed  below.)  We  assume  that 
the  tree  is  nonempty,  namely  that  RLINK  (HEAD)  ^ A. 

For  convenience  in  description,  the  algorithm  uses  the  notation  LINK  (a,  P) 
as  a synonym  for  LLINK  (P)  if  a — -1,  and  for  RLINK  (P)  if  a - +1. 

Al.  [Initialize.  ] Set  T 4—  HEAD,  S 4—  P 4—  RLINK  (HEAD) . (The  pointer  variable  P 
will  move  down  the  tree;  S will  point  to  the  place  where  rebalancing  may 
be  necessary,  and  T always  points  to  the  parent  of  S.) 

A2.  [Compare.]  If  K < KEY  (P) , go  to  A3;  if  K > KEY  (P) , go  to  A4;  and  if 
K = KEY(P),  the  search  terminates  successfully. 

A3.  [Move  left.]  Set  Q 4-  LLINK  (P) . If  Q = A,  set  Q <=  AVAIL  and  LLINK  (P)  4-  Q 
and  go  to  step  A5.  Otherwise  if  B (Q)  / 0,  set  T 4-  P and  S 4-  q.  Finally 
set  P 4—  q and  return  to  step  A2. 

A4.  [Move  right.]  Set  Q 4-  RLINK  (P) . If  Q = A,  set  Q •$=  AVAIL  and  RLINK  (P)  4-  q 
and  go  to  step  A5.  Otherwise  if  B(q)  / 0,  set  T 4-  P and  S 4-  q.  Finally  set 
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K<  KEY(P)/  J 

\ 7 

^ \r>key(p) 

A3.  Move  left 

] [ 

A4.  Move  right 

> SUCCESS 


A8.  Single 
rotation 

Rebalance 

aiu.  rimsn- 

A9.  Double 
rotation 

A 

i 

Fig.  22.  Balanced  tree  search  and  insertion. 


P 4—  Q and  return  to  step  A2.  (The  last  part  of  this  step  may  be  combined 
with  the  last  part  of  step  A3.) 

A5.  [Insert.]  (We  have  just  linked  a new  node,  NODE(Q),  into  the  tree,  and  its 
fields  need  to  be  initialized.)  Set  KEY (Q)  4-  K , LLINK(Q)  4-  RLINK(Q)  4-  A, 
and  B(q)  4—  0. 

A6.  [Adjust  balance  factors.]  (Now  the  balance  factors  on  nodes  between  S 

and  Q need  to  be  changed  from  zero  to  ±1.)  If  K < KEY(S)  set  a 4 1, 

otherwise  set  a 4 — (-1.  Then  set  R 4—  P 4—  LINK(a,S),  and  repeatedly  do 
the  following  operations  zero  or  more  times  until  P = Q:  If  K < KEY  (P)  set 

B(P)  4 1 and  P 4-  LLINK(P) ; if  K > KEY (P) , set  B(P)  4-  +1  and  P 4- 

RLINK(P).  (If  K = KEY(P),  then  P = Q and  we  proceed  to  the  next  step.) 

A7.  [Balancing  act.]  Several  cases  now  arise: 

i)  If  B(S)  — 0 (the  tree  has  grown  higher),  set  B(S)  4—  a,  LLINK(HEAD) 
4—  LLINK(HEAD)  + 1,  and  terminate  the  algorithm. 

ii)  If  B(S)  = —a  (the  tree  has  gotten  more  balanced),  set  B(S)  4-  0 and 
terminate  the  algorithm. 

iii)  If  B(S)  = a (the  tree  has  gotten  out  of  balance),  go  to  step  A8  if 
BCR)  = a,  to  A9  if  B(R)  = -a. 

(Case  (iii)  corresponds  to  the  situations  depicted  in  (l)  when  a = +1; 
S and  R point,  respectively,  to  nodes  A and  B,  and  LINK(— a,S)  points 
to  a,  etc.) 


464  SEARCHING 


6.2.3 


A8.  [Single  rotation.]  Set  P 4-  R,  LINK  (a,  S)  4-  LINK  (-a,  R) , LINK  (-a,  R)  4-S, 
B(S)  4 — B(R)  4 — 0.  Go  to  A10. 

A9.  [Double  rotation.]  Set  P -f-  LINK(-o.R),  LINK(-o.R)  4-  LINK(o,P), 
LINK  (a,  P)  4-  R,  LINK(o.S)  4-  LINK(-a.P),  LINK(-a.P)  4-  S.  Now  set 

f (—a,  0),  if  B(P)  = a; 

(B(S),B(R))  <—  l ( 0,0),  if  B(P)  = 0;  (3) 

[(  0, a),  if  B(P)  = —a; 

and  then  set  B(P)  4—  0. 

A10.  [Finishing  touch.]  (We  have  completed  the  rebalancing  transformation, 
taking  (i)  to  (2),  with  P pointing  to  the  new  subtree  root  and  T pointing 
to  the  parent  of  the  old  subtree  root  S.)  If  S = RLINK(T)  then  set 
RLINK(T)  4-  P,  otherwise  set  LLINK(T)  4—  P.  | 

This  algorithm  is  rather  long,  but  it  divides  into  three  simple  parts:  Steps 
A1-A4  do  the  search,  steps  A5-A7  insert  a new  node,  and  steps  A8-A10  rebal- 
ance the  tree  if  necessary.  Essentially  the  same  method  can  be  used  if  the  tree 
is  threaded  (see  exercise  6. 2. 2-2),  since  the  balancing  act  never  needs  to  make 
difficult  changes  to  thread  links. 

We  know  that  the  algorithm  takes  about  C log  N units  of  time,  for  some  C, 
but  it  is  important  to  know  the  approximate  value  of  C so  that  we  can  tell  how 
large  N should  be  in  order  to  make  balanced  trees  worth  all  the  trouble.  The 
following  MIX  implementation  gives  some  insight  into  this  question. 

Program  A ( Balanced  tree  search  and  insertion).  This  program  for  Algorithm  A 
uses  tree  nodes  having  the  form 


B 

LLINK 

RLINK 

KEY 

rA  = K,  rll  - P,  rI2  = Q,  rI3  = R,  rI4  = S,  rI5  = T.  The  code  for  steps  A7-A9 
is  duplicated  so  that  the  value  of  a appears  implicitly  (not  explicitly)  in  the 
program. 


01 

B 

EQU 

0:1 

02 

LLINK  EQU 

2:3 

03 

RLINK  EQU 

4:5 

04 

START 

LDA 

K 

1 

Al.  Initialize. 

05 

ENT5 

HEAD 

1 

T 4-  HEAD. 

06 

LD2 

0,5(RLINK) 

1 

Q 4-  RLINK  (HEAD). 

07 

JMP 

2F 

1 

To  A2  with  S 4—  P 4—  Q. 

08 

4H 

LD2 

0,1 (RLINK) 

C2 

A4.  Move  right.  0 4-  RLINK  ( P) 

09 

J2Z 

5F 

C2 

To  A5  if  Q = A. 

10 

1H 

LDX 

0.2(B) 

C-  1 

rX  4-  B(q). 

11 

JXZ 

*+3 

C-  1 

Jump  if  B(Q)  = 0. 

12 

ENT5 

0,1 

D — 1 

T 4-  P. 
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13 

2H 

ENT4 

0,2 

D 

S 4-  Q. 

H 

ENT1 

0,2 

C 

P 4-  Q. 

15 

CMPA 

1,1 

C 

A2.  Compare. 

16 

JG 

4B 

C 

To  A4  if  K > KEY  (P) . 

17 

JE 

SUCCESS 

Cl 

Exit  if  K — KEY (P) . 

18 

LD2 

0, l(LLINK) 

Cl-S 

A3.  Move  left.  Q 4-  LLINK  (P). 

19 

J2NZ 

IB 

Cl-S 

Jump  if  q ^ A. 

20 

5H 

LD2 

AVAIL 

1 -5 

A5.  Insert. 

21 

J2Z 

OVERFLOW 

1 -5 

22 

LDX 

0,2(RLINK) 

1-5 

23 

STX 

AVAIL 

1-5 

Q <=  AVAIL. 

U 

STA 

1,2 

1-5 

KEY(Q)  4-  K. 

25 

STZ 

0,2 

1-5 

LLINK (Q)  4-  RLINK (Q)  4-  A. 

26 

JL 

IF 

1-5 

Was  K < KEY(P)? 

27 

ST2 

0 , 1 (RLINK) 

A 

RLINK (P)  4-  Q. 

28 

JMP 

*+2 

A 

29 

1H 

ST2 

O.l(LLINK) 

1-5- A 

LLINK (P)  4-  Q. 

30 

6H 

CMPA 

1,4 

1-5 

A6.  Adjust  balance  factors. 

31 

JL 

*+3 

1-5 

Jump  if  K < KEY(S). 

32 

LD3 

0,4 (RLINK) 

E 

R 4-  RLINK (S). 

33 

JMP 

*+2 

E 

34 

LD3 

0,4(LLINK) 

1-S-E 

R 4-  LLINK (S). 

35 

ENT1 

0,3 

1-5 

P 4-  R. 

36 

ENTX 

-1 

1-S 

rX  4 — -1. 

37 

JMP 

IF 

1 - S 

To  comparison  loop. 

38 

4H 

JE 

7F 

F2  + 1-S 

To  A7  if  K = KEY  (P) . 

39 

STX 

0, 1(1:1) 

F2 

B(P)  4 — 1-1  (it  was  +0). 

40 

LD1 

0,1 (RLINK) 

F2 

P 4-  RLINK (P). 

41 

1H 

CMPA 

1,1 

F+l-S 

42 

JGE 

4B 

F + l-S 

Jump  if  K > KEY  (P) . 

43 

STX 

0.1(B) 

FI 

B(P)  4 1. 

44 

LD1 

0, l(LLINK) 

FI 

P 4-  LLINK (P). 

45 

JMP 

IB 

FI 

To  comparison  loop. 

46 

7H 

LD2 

0.4(B) 

1 - 5 

A 7.  Balancing  act.  rI2  4—  B(S). 

47 

STZ 

0.4(B) 

1 - S 

B(S)  4-  0. 

48 

CMPA 

1,4 

1 - 5 

49 

JG 

A7R 

1-5 

To  a = +1  routine  if  K > KEY(S) 

50 

A7L 

J2P 

DONE 

Ul 

Exit  if  rI2  = —a. 

51 

J2Z 

7F 

Gl  + Jl 

Jump  if  B(S)  was  zero. 

52 

ENT1 

0,3 

G1 

P 4-  R. 

53 

LD2 

0.3(B) 

G1 

rI2  4-  B(R) . 

54 

J2N 

A8L 

G1 

To  A8  if  rI2  = a. 

55 

A9L 

LD1 

0,3 (RLINK) 

HI 

A9.  Double  rotation. 

56 

LDX 

O.l(LLINK) 

HI 

LINK  (a,  P 4-  LINK(— a,R) ) 

57 

STX 

0,3 (RLINK) 

HI 

-4  LINK(-a.R). 

58 

ST3 

0 , 1 (LLINK) 

HI 

LINK(a,P)  4-  R. 

59 

LD2 

0.1(B) 

HI 

rI2  4—  B(P) . 

60 

LDX 

Tl,2 

HI 

—a,  0,  or  0 

61 

STX 

0.4(B) 

HI 

-4  B(S). 
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62 

LDX 

T2,2 

HI 

0,  0,  or  a 

63 

STX 

0.3(B) 

HI 

-A  B(R). 

64 

A8L 

LDX 

O.l(RLINK) 

G1 

A8.  Single  rotation. 

65 

STX 

0 ,4(LLINK) 

G1 

LINK(a.S)  4-  LINK(-a.P). 

66 

ST4 

0 , 1 (RLINK) 

G1 

LINK(-a.P)  4-  S. 

67 

JMP 

8F 

G1 

Join  up  with  the  other  branch. 

68 

A7R 

J2N 

DONE 

U2 

Exit  if  rI2  = —a. 

69 

J2Z 

6F 

G2  + J2 

Jump  if  B(S)  was  zero. 

70 

ENT1 

0,3 

G2 

P«-R. 

71 

LD2 

0.3(B) 

G2 

rI2  4—  B(R) . 

72 

J2P 

A8R 

G2 

To  A8  if  rI2  = a. 

73 

A9R 

LD1 

0,3(LLINK) 

H2 

A9.  Double  rotation. 

14 

LDX 

0,1 (RLINK) 

H2 

LINK  (a, P 4-  LINK(-a.R)) 

75 

STX 

0,3(LLINK) 

H2 

—t  LINK(-a.R). 

76 

ST3 

0 , 1 (RLINK) 

H2 

LINK(a.P)  4-  R. 

77 

LD2 

0.1(B) 

H2 

rI2  4-  B(P). 

78 

LDX 

T2,2 

H2 

—a,  0,  or  0 

79 

STX 

0.4(B) 

H2 

—t  B(S) . 

80 

LDX 

Tl,2 

H2 

0,  0,  or  a 

81 

STX 

0.3(B) 

H2 

—t  B(R). 

82 

A8R 

LDX 

O.l(LLINK) 

G2 

A8.  Single  rotation. 

83 

STX 

0,4 (RLINK) 

G2 

LINK  (a,  S)  4- LINK(-a.P). 

84 

ST4 

O.l(LLINK) 

G2 

LINK(-a.P)  4-  S. 

85 

8H 

STZ 

0.1(B) 

G 

B(P)  4-  0. 

86 

A10 

CMP4 

0,5 (RLINK) 

G 

A10.  Finishing  touch. 

87 

JNE 

*+3 

G 

Jump  if  RLINK  (T)  / S. 

88 

ST1 

0,5 (RLINK) 

G3 

RLINK (T)  4-  P. 

89 

JMP 

DONE 

G 3 

Exit. 

90 

ST1 

0,5(LLINK) 

G4 

LLINK(T)  4-  P. 

91 

JMP 

DONE 

G4 

Exit. 

92 

CON 

+1 

93 

T1 

CON 

0 

Table  for  (3). 

94 

T2 

CON 

0 

95 

CON 

-1 

96 

6H 

ENTX 

+1 

J2 

rX  4-  +1. 

97 

7H 

STX 

0.4(B) 

J 

B(S)  4-  a. 

98 

LDX 

HEAD(LLINK) 

J 

LL INK (HEAD) 

99 

INCX 

1 

J 

+ 1 

100 

STX 

HEAD (LL INK) 

J 

-*■  LLINK(HEAD) . 

101 

DONE 

EQU 

* 

1 - S 

Insertion  is  complete.  | 

Analysis  of  balanced  tree  insertion.  [Nonmathematical  readers,  please  skip 
to  (10).]  In  order  to  figure  out  the  running  time  of  Algorithm  A,  we  would  like 
to  know  the  answers  to  the  following  questions: 

• How  many  comparisons  are  made  during  the  search? 

• How  far  apart  will  nodes  S and  Q be?  (In  other  words,  how  much  adjustment 
is  needed  in  step  A6?) 

• How  often  do  we  need  to  do  a single  or  double  rotation? 
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It  is  not  difficult  to  derive  upper  bounds  on  the  worst-case  running  time,  using 
Theorem  A,  but  of  course  in  practice  we  want  to  know  the  average  behavior. 
No  theoretical  determination  of  the  average  behavior  has  been  successfully  com- 
pleted as  yet,  since  the  algorithm  appears  to  be  quite  complicated,  but  several 
interesting  theoretical  and  empirical  results  have  been  obtained. 

In  the  first  place  we  can  ask  about  the  number  Bnh  of  balanced  binary  trees 
with  n internal  nodes  and  height  h.  It  is  not  difficult  to  compute  the  generating 
function  Bh{z)  = Yln> o BnhZn  for  small  h,  from  the  relations 

B0(z)  = l,  B1(z)  = z,  Bh+1(z)  = zBh(z)(Bh(z)  + 2Bh_1(z)).  (5) 

(See  exercise  6.)  Thus 

B2(z)  = 2z 2 + r5, 

B3(z)  = 4z4  + 6z5  + 4z6  + z7, 

Bi{z)  = 16z7  + 32z8  + 44*9  + • • • + 8 z14  + z15, 

and  in  general  Bh(z)  has  the  form 

2Fh+i  izFh+2  1 _|_  2 Fh+i~2 Lh_-lzFh+2  + complicated  terms  + 2 h~iz2h~2  + z 2h_1 

(6) 

for  h > 3,  where  Lfe  = Fk+i+Fk-i.  (This  formula  generalizes  Theorem  A.)  The 
total  number  of  balanced  trees  with  height  h is  B h = Bh(  1),  which  satisfies  the 
recurrence 

B0  = Bi  = 1,  Bh+i  = Bl  + 2BhBh-i,  (7) 

so  that  B2  = 3,  i?3  = 3 • 5,  B4  = 32  • 5 • 7,  B$  = 33  • 52  • 7 • 23;  and,  in  general, 

Bh  = A*A?-'...A«1A«  (8) 

where  A0  = 1,  Ax  = 3,  A2  = 5 , Aa  = 7,  A4  = 23,  A5  = 347,  ...,  Ah  = 
Ah-iBh_2  + 2.  The  sequences  Bh  and  Ah  grow  very  rapidly;  in  fact,  they  are 
doubly  exponential : Exercise  7 shows  that  there  is  a real  number  9 m 1.43687 
such  that 

Bfc  = L®2hj-LO  + LO--  + (-DfcL«a°J-  (9) 

If  we  consider  each  of  the  Bh  trees  to  be  equally  likely,  exercise  8 shows  that  the 
average  number  of  nodes  in  a tree  of  height  h is 

B'h(l)/Bh(l)  « (0.70118)2'1  - 1.  (10) 

This  indicates  that  the  height  of  a balanced  tree  with  N nodes  is  usually  much 
closer  to  log2  N than  to  log^  N. 

Unfortunately,  these  results  don’t  really  have  much  to  do  with  Algorithm  A, 
since  the  mechanism  of  that  algorithm  makes  some  trees  significantly  more 
probable  than  others.  For  example,  consider  the  case  N = 7,  where  17  balanced 
trees  are  possible.  There  are  7!  = 5040  possible  orderings  in  which  seven  keys 
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can  be  inserted,  and  the  perfectly  balanced  “complete”  tree 


(11) 


is  obtained  2160  times.  By  contrast,  the  Fibonacci  tree 


(12) 


occurs  only  144  times,  and  the  similar  tree 


occurs  216  times.  Replacing  the  left  subtrees  of  (12)  and  (13)  by  arbitrary  four- 
node  balanced  trees,  and  then  reflecting  left  and  right,  yields  16  different  trees; 
the  eight  generated  from  (12)  each  occur  144  times,  and  those  generated  from 
(13)  each  occur  216  times.  It  is  surprising  that  (13)  is  more  common  than  (12). 

The  fact  that  the  perfectly  balanced  tree  is  obtained  with  such  high  prob- 
ability— together  with  (10),  which  corresponds  to  the  case  of  equal  probabili- 
ties — makes  it  plausible  that  the  average  search  time  for  a balanced  tree  should 
be  about  lgiV  + c comparisons  for  some  small  constant  c.  But  R.  W.  Floyd 
has  observed  that  the  coefficient  of  lg  N is  unlikely  to  be  exactly  1,  because  the 
root  of  the  tree  would  then  be  near  the  median,  and  the  roots  of  its  two  subtrees 
would  be  near  the  quartiles;  then  single  and  double  rotation  could  not  easily  keep 
the  root  near  the  median.  Empirical  tests  indicate  that  the  true  average  number 
of  comparisons  needed  to  insert  the  TVth  item  is  approximately  1.01  lg  N + 0.1, 
except  when  N is  small. 

In  order  to  study  the  behavior  of  the  insertion  and  rebalancing  phases  of 
Algorithm  A,  we  can  classify  the  external  nodes  of  balanced  trees  as  shown 
in  Fig.  23.  The  path  leading  up  from  an  external  node  can  be  specified  by  a 
sequence  of  +’s  and  -’s  (+  for  a right  link,  - for  a left  link);  we  write  down  the 
link  specifications  until  reaching  the  first  node  with  a nonzero  balance  factor, 
or  until  reaching  the  root,  if  there  is  no  such  node.  Then  we  write  A or  B 
according  as  the  new  tree  will  be  balanced  or  unbalanced  when  an  internal  node 
is  inserted  in  the  given  place.  Thus  the  path  up  from  [3]  is  ++-B,  meaning 
“right  link,  right  link,  left  link,  unbalance.”  A specification  ending  in  A requires 
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Fig.  23.  Classification  codes  that  specify  the  behavior  of  Algorithm  A after  insertion. 

no  rebalancing  after  insertion  of  a new  node;  a specification  ending  in  ++B  or  — B 
requires  a single  rotation;  and  a specification  ending  in  +-B  or  -+B  requires  a 
double  rotation.  When  k links  appear  in  the  specification,  step  A6  has  to  adjust 
exactly  k — 1 balance  factors.  Thus  the  specifications  give  the  essential  facts  that 
govern  the  running  time  of  steps  A6  to  A10. 

Empirical  tests  on  random  numbers  for  100  < N < 2000  gave  the  approxi- 
mate probabilities  shown  in  Table  1 for  paths  of  various  types;  apparently  these 
probabilities  rapidly  approach  limiting  values  as  N — » oo.  Table  2 gives  the 
exact  probabilities  corresponding  to  Table  1 when  N — 10,  considering  the  10! 
permutations  of  the  input  as  equally  probable.  (The  probabilities  that  show  up 
as  .143  in  Table  1 are  actually  equal  to  1/7,  for  all  N > 7;  see  exercise  11.  Single 
and  double  rotations  are  equally  likely  when  N < 15,  but  double  rotations  occur 
slightly  less  often  when  N > 16.) 


Table  1 

APPROXIMATE  PROBABILITIES  FOR  INSERTING  THE  IVTH  ITEM 


Path  length  k 

No  rebalancing 

Single  rotation 

Double  rotation 

1 

.143 

.000 

.000 

2 

.152 

.143 

.143 

3 

.092 

.048 

.048 

4 

.060 

.024 

.024 

5 

.036 

.010 

.010 

> 5 

.051 

.009 

.008 

ave  2.78 

total  .534 

.233 

.232 

From  Table  1 we  can  see  that  k is  < 2 with  probability  about  .143  + .152  + 
.143  + .143  = .581;  thus,  step  A6  is  quite  simple  almost  60  percent  of  the  time. 
The  average  number  of  balance  factors  changed  from  0 to  ±1  in  that  step  is 
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Path  length  k 

No  rebalancing 

Single  rotation 

Double  rotation 

1 

1/7 

0 

0 

2 

6/35 

1/7 

1/7 

3 

4/21 

2/35  ' 

2/35 

4 

0 

1/21 

1/21 

ave  247/105 

53/105 

26/105 

26/105 

about  1.8.  The  average  number  of  balanced  factors  changed  from  ±1  to  0 in 
steps  A7  through  A10  is  approximately  .534+2 (.233+. 232)  ss  1.5;  thus,  inserting 
one  new  node  adds  about  1.8  — 1.5  = 0.3  unbalanced  nodes,  on  the  average.  This 
agrees  with  the  fact  that  about  68  percent  of  all  nodes  were  found  to  be  balanced 
in  random  trees  built  by  Algorithm  A. 

An  approximate  model  of  the  behavior  of  Algorithm  A has  been  proposed 
by  C.  C.  Foster  [Proc.  ACM  Nat.  Conf.  20  (1965),  192-205.]  This  model  is 
not  rigorously  accurate,  but  it  is  close  enough  to  the  truth  to  give  some  insight. 
Let  us  assume  that  p is  the  probability  that  the  balance  factor  of  a given  node 
in  a large  tree  built  by  Algorithm  A is  0;  then  the  balance  factor  is  +1  with 
probability  |(1  — p),  and  it  is  —1  with  the  same  probability  i(l  — p).  Let  us 
assume  further  (without  justification)  that  the  balance  factors  of  all  nodes  are 
independent.  Then  the  probability  that  step  A6  sets  exactly  A;  - 1 balance  factors 
nonzero  is  pk~1(l  - p),  so  the  average  value  of  k is  1/(1  — p).  The  probability 
that  we  need  to  rotate  part  of  the  tree  is  q « ~.  Inserting  a new  node  should 
increase  the  number  of  balanced  nodes  by  p,  on  the  average;  this  number  is 
actually  increased  by  1 in  step  A5,  by  -p/{  1 - p)  in  step  A6,  by  q in  step  A7, 
and  by  2q  in  step  A8  or  A9,  so  we  should  have 

P = 1 - P/(1  - P)  + 3g  « 5/2  - p/(  1 - p). 

Solving  for  p yields  fair  agreement  with  Table  1: 

9 — v/il 

p « ^ — « 0.649;  1/(1  - p)  « 2.851.  (14) 

The  running  time  of  the  search  phase  of  Program  A (lines  01-19)  is 

10C  + Cl  + 2D  + 2 - 35,  (15) 

where  C,  Cl,  S are  the  same  as  in  previous  algorithms  of  this  chapter  and  D is 
the  number  of  unbalanced  nodes  encountered  on  the  search  path.  Empirical  tests 
show  that  we  may  take  D « |C,  Cl  « §(C  + 5),  C + S « 1.01  lg  N + 0.1,  so  the 
average  search  time  is  approximately  11.3  lg  AT + 3 - 13.75  units.  (If  searching  is 
done  much  more  often  than  insertion,  we  could  of  course  use  a separate,  faster 
program  for  searching,  since  it  would  be  unnecessary  to  look  at  the  balance 
factors;  the  average  running  time  for  a successful  search  would  then  be  only 
about  (6.6  lg N - 3.4 )u,  and  the  worst  case  running  time  would  in  fact  be  better 
than  the  average  running  time  obtained  with  Program  6.2.2T.) 
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The  running  time  of  the  insertion  phase  of  Program  A (lines  20-45)  is  8 F + 
26  + (0,  1,  or  2)  units,  when  the  search  is  unsuccessful.  The  data  of  Table  1 
indicate  that  F « 1.8  on  the  average.  The  rebalancing  phase  (lines  46-101) 
takes  either  16.5,  8,  27.5,  or  45.5  (±0.5)  units,  depending  on  whether  we  increase 
the  total  height,  or  simply  exit  without  rebalancing,  or  do  a single  or  double 
rotation.  The  first  case  almost  never  occurs,  and  the  others  occur  with  the 
approximate  probabilities  .534,  .233,  .232,  so  the  average  running  time  of  the 
combined  insertion-rebalancing  portion  of  Program  A is  about  63  w. 

These  figures  indicate  that  maintenance  of  a balanced  tree  in  memory  is 
reasonably  fast,  even  though  the  program  is  rather  lengthy.  If  the  input  data 
are  random,  the  simple  tree  insertion  algorithm  of  Section  6.2.2  is  roughly  50 u 
faster  per  insertion;  but  the  balanced  tree  algorithm  is  guaranteed  to  be  reliable 
even  with  nonrandom  input  data. 

One  way  to  compare  Program  A with  Program  6.2.2T  is  to  consider  the 
worst  case  of  the  latter.  If  we  study  the  amount  of  time  necessary  to  insert  N 
keys  in  increasing  order  into  an  initially  empty  tree,  it  turns  out  that  Program  A 
is  slower  for  N < 26  and  faster  for  N > 27. 

Linear  list  representation.  Now  let  us  return  to  the  claim  made  at  the 
beginning  of  this  section,  that  balanced  trees  can  be  used  to  represent  linear 
lists  in  such  a way  that  we  can  insert  items  rapidly  (overcoming  the  difficulty 
of  sequential  allocation),  yet  we  can  also  perform  random  accesses  to  list  items 
(overcoming  the  difficulty  of  linked  allocation). 

The  idea  is  to  introduce  a new  field  in  each  node,  called  the  RANK  field.  The 
field  indicates  the  relative  position  of  that  node  in  its  subtree,  namely  one  plus 
the  number  of  nodes  in  its  left  subtree.  Figure  24  shows  the  RANK  values  for  the 
binary  tree  of  Fig.  23.  We  can  eliminate  the  KEY  field  entirely;  or,  if  desired,  we 
can  have  both  KEY  and  RANK  fields,  so  that  it  is  possible  to  retrieve  items  either 
by  their  key  value  or  by  their  relative  position  in  the  list. 

Using  such  a RANK  field,  retrieval  by  position  is  a straightforward  modifica- 
tion of  the  search  algorithms  we  have  been  studying. 
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Algorithm  B ( Tree  search  by  position).  Given  a linear  list  represented  as  a 
binary  tree,  this  algorithm  finds  the  fcth  element  of  the  list  (the  fcth  node  of  the 
tree  in  symmetric  order),  given  k.  The  binary  tree  is  assumed  to  have  LLINK 
and  RLINK  fields  and  a header  as  in  Algorithm  A,  plus  a RANK  field  as  described 
above. 

Bl.  [Initialize.]  Set  M 4-  k,  P 4-  RLINK  (HEAD) . 

B2.  [Compare.]  If  P = A,  the  algorithm  terminates  unsuccessfully.  (This  can 
happen  only  if  k was  greater  than  the  number  of  nodes  in  the  tree,  or 
k < 0.)  Otherwise  if  M < RANK(P),  go  to  B3;  if  M > RANK(P) , go  to  B4;  and 
if  M = RANK(P),  the  algorithm  terminates  successfully  (P  points  to  the  fcth 
node). 

B3.  [Move  left.]  Set  P 4—  LLINK  (P)  and  return  to  B2. 

B4.  [Move  right.]  Set  M 4—  M — RANK(P)  and  P 4—  RLINK (P)  and  return  to  B2.  | 

The  only  new  point  of  interest  in  this  algorithm  is  the  manipulation  of  M in 
step  B4.  We  can  modify  the  insertion  procedure  in  a similar  way,  although  the 
details  are  somewhat  trickier: 

Algorithm  C ( Balanced  tree  insertion  by  position).  Given  a linear  list  repre- 
sented as  a balanced  binary  tree,  this  algorithm  inserts  a new  node  just  before 
the  fcth  element  of  the  list,  given  fc  and  a pointer  Q to  the  new  node.  If  fc  = N + 1, 
the  new  node  is  inserted  just  after  the  last  element  of  the  list. 

The  binary  tree  is  assumed  to  be  nonempty  and  to  have  LLINK,  RLINK  and 
B fields  and  a header,  as  in  Algorithm  A,  plus  a RANK  field  as  described  above. 
This  algorithm  is  merely  a transcription  of  Algorithm  A;  the  difference  is  that 
it  uses  and  updates  the  RANK  fields  instead  of  the  KEY  fields. 

Cl.  [Initialize.]  Set  T 4-  HEAD,  S 4-  P 4-  RLINK  (HEAD) , U 4-  M 4-  fc. 

C2.  [Compare.]  If  M < RANK(P),  go  to  C3,  otherwise  go  to  C4. 

C3.  [Move  left.]  Set  RANK(P)  4—  RANK(P)  + 1 (we  will  be  inserting  a new  node 
to  the  left  of  P).  Set  R 4-  LLINK (P).  If  R = A,  set  LLINK (P)  4-  Q and  go 
to  C5.  Otherwise  if  B(R)  / 0 set  T 4—  P,  S 4—  R,  and  U 4-  M.  Finally  set 
P 4—  R and  return  to  C2. 

C4.  [Move  right.]  Set  M 4-  M - RANK(P) , and  R 4-  RLINK  (P) . If  R = A,  set 
RLINK (P)  4—  Q and  go  to  C5.  Otherwise  if  B(R)  ± 0 set  T 4-  P,  S 4—  R,  and 
U 4—  M.  Finally  set  P 4—  R and  return  to  C2. 

C5.  [Insert.]  Set  RANK(Q)  4-  1,  LLINK (Q)  4-  RLINK (Q)  4-  A,  B(Q)  4-  0. 

C6.  [Adjust  balance  factors.]  Set  M 4—  U.  (This  restores  the  former  value  of  M 
when  P was  S;  all  RANK  fields  are  now  properly  set.)  If  M < RANK(S),  set 

R 4-  P 4-  LLINK(S)  and  a 4 1;  otherwise  set  R 4—  P 4—  RLINK(S),  a 4— 

+1,  and  M 4—  M — RANK(S).  Then  repeatedly  do  the  following  operations 

until  P = Q:  If  M < RANK(P) , set  B(P)  4 1 and  P 4-  LLINK (P) ; if 

M > RANK(P) , set  B(P)  4-  +1  and  M 4-  M - RANK(P)  and  P 4-  RLINK (P) . 
(If  M = RANK(P) , then  P = Q and  we  proceed  to  the  next  step.) 

C7.  [Balancing  act.]  Several  cases  now  arise. 
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i)  If  B(S)  = 0,  set  B(S)  4-  a,  LL  INK  (HEAD)  4-  LLINK(HEAD)  + 1,  and 
terminate  the  algorithm. 

ii)  If  B(S)  = —a,  set  B(S)  4—  0 and  terminate  the  algorithm. 

iii)  If  B (S)  = a,  go  to  step  C8  if  B(R)  = a,  to  C9  if  B(R)  = —a. 

C8.  [Single  rotation.]  Set  P4-R,  LINK(a,S)  4-  LINK(-a.R),  LINK(-a.R)  4-  S, 
B(S)  4-  BCR)  4-0.  If  a = +1,  set  RANK(R)  4-  RANK(R)  + RANK(S) ; if 
a = -1,  set  RANK(S)  4-  RANK(S)  - RANK(R) . Go  to  CIO. 

C9.  [Double  rotation.]  Do  all  the  operations  of  step  A9  (Algorithm  A).  Then 
if  a = +1,  set  RANK  (R)  4-  RANK(R)  - RANK(P),  RANK(P)  4-  RANK(P)  + 
RANK(S) ; if  a = -1,  set  RANK(P)  4-  RANK(P)  + RANK (R) , then  RANK(S)  4- 
RANK(S)  - RANK(P) . 

CIO.  [Finishing  touch.]  If  S = RLINK(T)  then  set  RLINK(T)  4—  P,  otherwise  set 
LLINK(T)4-P.  | 

*Deletion,  concatenation,  etc.  It  is  possible  to  do  many  other  things  to 
balanced  trees  and  maintain  the  balance,  but  the  algorithms  are  sufficiently 
lengthy  that  the  details  are  beyond  the  scope  of  this  book.  We  shall  discuss 
the  general  ideas  here,  and  an  interested  reader  will  be  able  to  fill  in  the  details 
without  much  difficulty. 

The  problem  of  deletion  can  be  solved  in  0(log  N)  steps  if  we  approach  it 
correctly  [C.  C.  Foster,  “A  Study  of  AVL  Trees,”  Goodyear  Aerospace  Corp. 
report  GER-12158  (April  1965)].  In  the  first  place  we  can  reduce  deletion  of 
an  arbitrary  node  to  the  simple  deletion  of  a node  P for  which  LLINK(P)  or 
RLINK(P)  is  A,  as  in  Algorithm  6. 2. 2D.  The  algorithm  should  also  be  modified 
so  that  it  constructs  a list  of  pointers  that  specify  the  path  to  node  P,  namely 

(P0,ao),  (-Pi.ai),  ...,  (Pi, at),  (16) 

where  P0  = HEAD,  a0  = +1;  LINK(a;,Pi)  = Pi+1,  for  0 < i < l;  Pi  = P;  and 
LINK  (a,.P,)  = A.  This  list  can  be  placed  on  an  auxiliary  stack  as  we  search  down 
the  tree.  The  process  of  deleting  node  P sets  LINK(a/_i  ,Pi~\)  4—  LINK  (—a/  ,P/) , 
and  we  must  adjust  the  balance  factor  at  node  P;_i.  Suppose  that  we  need  to 
adjust  the  balance  factor  at  node  P*,,  because  the  a*,  subtree  of  this  node  has 
just  decreased  in  height;  the  following  adjustment  procedure  should  be  used:  If 
k = 0,  set  LLINK(HEAD)  4—  LLINK(HEAD)  — 1 and  terminate  the  algorithm,  since 
the  whole  tree  has  decreased  in  height.  Otherwise  look  at  the  balance  factor 
B(Pfc);  there  are  three  cases: 

i)  B(Pfc)  = Ofc.  Set  B(Pfc)  4—  0,  decrease  k by  1,  and  repeat  the  adjustment 
procedure  for  this  new  value  of  k. 

ii)  B(Pfc)  = 0.  Set  B (P*,)  to  —a^  and  terminate  the  deletion  algorithm. 

iii)  B(Pfc)  = — a*,.  Rebalancing  is  required! 

The  situations  that  require  rebalancing  are  almost  the  same  as  we  met  in  the 
insertion  algorithm;  referring  again  to  (l),  A is  node  P*,,  and  B is  the  node 
LINK(— afe  ,Pfc),  on  the  opposite  branch  from  where  the  deletion  has  occurred. 
The  only  new  feature  is  that  node  B might  be  balanced;  this  leads  to  a new 
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Case  3,  which  is  like  Case  1 except  that  /3  has  height  h + 1.  In  the  for- 
mer cases,  rebalancing  as  in  (2)  means  that  we  decrease  the  height,  so  we  set 
LINK(<Zfc_i  ,.Pfc_  1)  to  the  root  of  (2),  decrease  k by  1,  and  restart  the  adjustment 
procedure  for  this  new  value  of  k.  In  Case  3 we  do  a single  rotation,  and  this 
leaves  the  balance  factors  of  both  A and  B nonzero  without  changing  the  overall 
height;  after  making  LINK(afc_i  , Pfc_i)  point  to  node  B,  we  therefore  terminate 
the  algorithm. 

The  important  difference  between  deletion  and  insertion  is  that  deletion 
might  require  up  to  log  N rotations,  while  insertion  never  needs  more  than  one. 
The  reason  for  this  becomes  clear  if  we  try  to  delete  the  rightmost  node  of  a 
Fibonacci  tree  (see  Fig.  8 in  Section  6.2.1).  But  empirical  tests  show  that  only 
about  0.21  rotations  per  deletion  are  actually  needed,  on  the  average. 

The  use  of  balanced  trees  for  linear  list  representation  suggests  also  the 
need  for  a concatenation  algorithm,  where  we  want  to  insert  an  entire  tree  L2  to 
the  right  of  tree  Li,  without  destroying  the  balance.  An  elegant  algorithm  for 
concatenation  was  first  devised  by  Clark  A.  Crane:  Assume  that  height (Tx)  > 
height (L2);  the  other  case  is  similar.  Delete  the  first  node  of  L2,  calling  it  the 
juncture  node  J , and  let  L2  be  the  new  tree  for  L2  \ { .7  j . Now  go  down  the  right 
links  of  Li  until  reaching  a node  P such  that 


height  (P)  - height^)  = 0 or  1; 

this  is  always  possible,  since  the  height  changes  by  1 or  2 each  time  we  go  down 
one  level.  Then  replace  Tp)  by 


and  proceed  to  adjust  L\  as  if  the  new  node  J had  just  been  inserted  by 
Algorithm  A. 

Crane  also  solved  the  more  difficult  inverse  problem,  to  split  a list  into  two 
parts  whose  concatenation  would  be  the  original  list.  Consider,  for  example, 
the  problem  of  splitting  the  list  in  Fig.  20  to  obtain  two  lists,  one  containing 
{A, . . . , 1}  and  the  other  containing  { J, . . . , Q};  a major  reassembly  of  the  subtrees 
is  required.  In  general,  when  we  want  to  split  a tree  at  some  given  node  P,  the 
path  to  P will  be  something  like  that  in  Fig.  25.  We  wish  to  construct  a left 
tree  that  contains  the  nodes  of  c*i,  Pi,  <24,  P4, 0:6,  P6,  £*7,  P7,  ct,  P in  symmetric 
order,  and  a right  tree  that  contains  0,  P8,  /38,  P5, 05,  P3,  fa,  P2,  fa.  This  can  be 
done  by  a sequence  of  concatenations:  First  insert  P at  the  right  of  a,  then 
concatenate  (3  with  /3$  using  Pg  as  juncture  node,  concatenate  a 7 with  aP  using 
P7  as  juncture  node,  a e with  07P7QP  using  P6,  (3P&/3$  with  /?5  using  P5,  etc.;  the 
nodes  Pg,  P7,  • . . , Pi  on  the  path  to  P are  used  as  juncture  nodes.  Crane  proved 
that  this  splitting  algorithm  takes  only  0(\ogN)  units  of  time,  when  the  original 
tree  contains  N nodes;  the  essential  reason  is  that  concatenation  using  a given 
juncture  node  takes  0(k)  steps,  where  k is  the  difference  in  heights  between  the 
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trees  being  concatenated,  and  the  values  of  k that  must  be  summed  essentially 
form  a telescoping  series  for  both  the  left  and  right  trees  being  constructed. 

All  of  these  algorithms  can  be  used  with  either  KEY  or  RANK  fields  or  both, 
although  in  the  case  of  concatenation  the  keys  of  L 2 must  all  be  greater  than 
the  keys  of  L\.  For  general  purposes  it  is  often  preferable  to  use  a triply  linked 
tree,  with  UP  links  as  well  as  LLINKs  and  RLINKs,  together  with  a new  one-bit 
field  that  specifies  whether  a node  is  the  left  or  right  child  of  its  parent.  The 
triply  linked  tree  representation  simplifies  the  algorithms  slightly,  and  allows  us 
to  specify  nodes  in  the  tree  without  explicitly  tracing  the  path  to  that  node;  we 
can  write  a subroutine  to  delete  NODE(P),  given  P,  or  to  delete  the  node  that 
follows  NODE(P)  in  symmetric  order,  or  to  find  the  list  containing  NODE(P),  etc. 
In  the  deletion  algorithm  for  triply  linked  trees  it  is  unnecessary  to  construct  the 
list  (16),  since  the  UP  links  provide  the  information  we  need.  Of  course,  a triply 
linked  tree  requires  us  to  change  a few  more  links  when  insertions,  deletions,  and 
rotations  are  being  performed.  The  use  of  a triply  linked  tree  instead  of  a doubly 
linked  tree  is  analogous  to  the  use  of  two-way  linking  instead  of  one-way:  We  can 
start  at  any  point  and  go  either  forward  or  backward.  A complete  description  of 
list  algorithms  based  on  triply  linked  balanced  trees  appears  in  Clark  A.  Crane’s 
Ph.D.  thesis  (Stanford  University,  1972). 

Alternatives  to  AVL  trees.  Many  other  ways  have  been  proposed  to  organize 
trees  so  that  logarithmic  accessing  time  is  guaranteed.  For  example,  C.  C.  Foster 
[CACM  16  (1973),  513-517]  considered  the  binary  trees  that  arise  when  we  allow 
the  height  difference  of  subtrees  to  be  at  most  k.  Such  structures  have  been  called 
HB(fc)  (meaning  “height-balanced”),  so  that  ordinary  balanced  trees  represent 
the  special  case  HB(1). 
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The  interesting  concept  of  weight-balanced,  trees  has  been  studied  by  J.  Nie- 
vergelt,  E.  Reingold,  and  C.  K.  Wong.  Instead  of  considering  the  height  of  trees, 
they  stipulate  that  the  subtrees  of  all  nodes  must  satisfy 


y/2-1  < 


left  weight 
right  weight 


< V2  + 1, 


(17) 


where  the  left  and  right  weights  count  the  number  of  external  nodes  in  the 
left  and  right  subtrees,  respectively.  It  is  possible  to  show  that  weight  balance 
can  be  maintained  under  insertion,  using  only  single  and  double  rotations  for 
rebalancing  as  in  Algorithm  A (see  exercise  25).  However,  it  may  be  necessary 
to  do  several  rebalancings  during  a single  insertion.  It  is  possible  to  relax 
the  conditions  of  (17),  decreasing  the  amount  of  rebalancing  at  the  expense 
of  increased  search  time. 

Weight-balanced  trees  may  seem  at  first  glance  to  require  more  memory 
than  plain  balanced  trees,  but  in  fact  they  sometimes  require  slightly  less!  If  we 
already  have  a RANK  field  in  each  node,  for  the  linear  list  representation,  this  is 
precisely  the  left  weight,  and  it  is  possible  to  keep  track  of  the  corresponding 
right  weights  as  we  move  down  the  tree.  But  it  appears  that  the  bookkeeping 
required  for  maintaining  weight  balance  takes  more  time  than  Algorithm  A,  and 
the  elimination  of  two  bits  per  node  is  probably  not  worth  the  trouble. 


Why  don't  you  pair  ’em  up  in  threes? 
— attributed  to  YOGI  BERRA  (c.  1970) 

Another  interesting  alternative  to  AVL  trees,  called  “2-3  trees,”  was  intro- 
duced by  John  Hopcroft  in  1970  [see  Aho,  Hopcroft,  and  Ullman,  The  Design 
and  Analysis  of  Computer  Algorithms  (Reading,  Mass.:  Addison- Wesley,  1974), 
Chapter  4].  The  idea  is  to  have  either  2- way  or  3- way  branching  at  each  node, 
and  to  stipulate  that  all  external  nodes  appear  on  the  same  level.  Every  internal 
node  contains  either  one  or  two  keys,  as  shown  in  Fig.  26. 


Insertion  into  a 2-3  tree  is  somewhat  easier  to  explain  than  insertion  into  an 
AVL  tree:  If  we  want  to  put  a new  key  into  a node  that  contains  just  one  key, 
we  simply  insert  it  as  the  second  key.  On  the  other  hand,  if  the  node  already 
contains  two  keys,  we  divide  it  into  two  one-key  nodes,  and  insert  the  middle  key 
into  the  parent  node.  This  may  cause  the  parent  node  to  be  divided  in  a similar 
way,  if  it  already  contains  two  keys.  Figure  27  shows  the  process  of  inserting  a 
new  key  into  the  2-3  tree  of  Fig.  26. 
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Fig.  27.  Inserting  the  new  key  “M”  into  the  2-3  tree  of  Fig.  26. 

Hopcroft  observed  that  deletion,  concatenation,  and  splitting  can  all  be 
done  with  2-3  trees,  in  a reasonably  straightforward  manner  analogous  to  the 
corresponding  operations  with  AVL  trees. 

R.  Bayer  [ Proc . ACM-SIGFIDET  Workshop  (1971),  219-235]  proposed  an 
interesting  binary  tree  representation  for  2-3  trees.  See  Fig.  28,  which  shows  the 
binary  tree  representation  of  Fig.  26;  one  bit  in  each  node  is  used  to  distinguish 
“horizontal”  RLINKs  from  “vertical”  ones.  Note  that  the  keys  of  the  tree  appear 
from  left  to  right  in  symmetric  order,  just  as  in  any  binary  search  tree.  It  turns 
out  that  the  transformations  we  need  to  perform  on  such  a binary  tree,  while  in- 
serting a new  key  as  in  Fig.  27,  are  precisely  the  single  and  double  rotations  used 
while  inserting  a new  key  into  an  AVL  tree,  although  we  need  just  one  version 
of  each  rotation,  not  the  left-right  reflections  needed  by  Algorithms  A and  C. 


Fig.  28.  The  2-3  tree  of  Fig.  26  represented  as  a binary  search  tree. 

Elaboration  of  these  ideas  has  led  to  many  additional  flavors  of  balanced 
trees,  most  notably  the  red-black  trees,  also  called  symmetric  binary  R-trees  or 
half-balanced  trees  [R.  Bayer,  Acta  Informatica  1 (1972),  290-306;  L.  Guibas 
and  R.  Sedgewick,  FOCS  19  (1978),  8-21;  H.  J.  Olivie,  RAIRO  Informatique 
Theorique  16  (1982),  51-71;  R.  E.  Tarjan,  Inf.  Proc.  Letters  16  (1983),  253-257; 
T.  H.  Cormen,  C.  E.  Leiserson,  and  R.  L.  Rivest,  Introduction  to  Algorithms 
(MIT  Press,  1990),  Chapter  14;  R.  Sedgewick,  Algorithms  in  C (Addison- Wesley, 
1997),  §13.4].  There  is  also  a strongly  related  family  called  hysterical  B-trees  or 
(a,  6)- trees,  notably  (2, 4)-trees  [D.  Maier  and  S.  C.  Salveter,  Inf.  Proc.  Letters  12 
(1981),  199-202;  S.  Huddleston  and  K.  Mehlhorn,  Acta  Informatica  17  (1982), 
157-184], 
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When  some  keys  are  accessed  much  more  frequently  than  others,  we  want 
the  important  ones  to  be  relatively  close  to  the  root,  as  in  the  optimum  binary 
search  trees  of  Section  6.2.2.  Dynamic  trees  that  make  it  possible  to  maintain 
weighted  balance  within  a constant  factor  of  the  optimum,  called  biased  trees , 
have  been  developed  by  S.  W.  Bent,  D.  D.  Sleator,  and  R.  E.  Tarjan,  SICOMP 
14  (1985),  545-568;  J.  Feigenbaum  and  R.  E.  Tarjan,  Bell  System  Tech.  J.  62 
(1983),  3139-3158.  The  algorithms  are,  however,  quite  complicated. 

A much  simpler  self-adjusting  data  structure  called  a splay  tree  was  devel- 
oped subsequently  by  D.  D.  Sleator  and  R.  E.  Tarjan  [JACM  32  (1985),  652-686], 
based  on  ideas  like  the  move-to-front  and  transposition  heuristics  discussed  in 
Section  6.1;  similar  techniques  had  previously  been  explored  by  B.  Allen  and 
I.  Munro  [JACM  25  (1978),  526-535]  and  by  J.  Bitner  [SICOMP  8 (1979), 
82-110].  Splay  trees,  like  the  other  kinds  of  balanced  trees  already  mentioned, 
support  the  operations  of  concatenation  and  splitting  as  well  as  insertion  and 
deletion,  and  in  a particularly  simple  way.  Moreover,  the  time  needed  to  access 
data  in  a splay  tree  is  known  to  be  at  most  a small  constant  multiple  of  the  access 
time  of  a statically  optimum  tree,  when  amortized  over  any  series  of  operations. 
Indeed,  Sleator  and  Tarjan  conjectured  that  the  total  splay  tree  access  time  is 
at  most  a constant  multiple  of  the  optimum  time  to  access  data  and  to  perform 
rotations  dynamically  by  any  binary  tree  algorithm  whatsoever. 

Randomization  leads  to  methods  that  appear  to  be  even  simpler  and  faster 
than  splay  trees.  Jean  Vuillemin  [CACM  23  (1980),  229-239]  introduced  Car- 
tesian trees,  in  which  every  node  has  two  keys  ( x,y ).  The  x parts  are  ordered 
from  left  to  right  as  in  binary  search  trees;  the  y parts  are  ordered  from  top  to 
bottom  as  in  the  priority  queue  trees  of  Section  5.2.3.  C.  R.  Aragon  and  R.  G. 
Seidel  gave  this  data  structure  the  more  colorful  name  treap,  because  it  neatly 
combines  the  notions  of  trees  and  heaps.  Exactly  one  treap  can  be  formed  with 
n given  key  pairs  (aq,  y{),  . . . , (xn,yn),  if  the  x’s  and  y’s  are  distinct.  One  way  to 
obtain  it  is  to  insert  the  z’s  by  Algorithm  6.2.2T  according  to  the  order  of  the  y' s; 
but  there  is  also  a simple  algorithm  that  inserts  any  new  key  pair  directly  into  any 
treap.  Aragon  and  Seidel  observed  [ FOCS  30  (1989),  540-546]  that  if  the  x's  are 
ordinary  keys  while  the  y's  are  chosen  at  random,  we  can  be  sure  that  the  treap 
has  the  shape  of  a random  binary  search  tree.  In  particular,  a treap  with  random 
y values  will  always  be  reasonably  well  balanced,  except  with  exponentially  small 
probability  (see  exercise  5.2.2-42).  Aragon  and  Seidel  also  showed  that  treaps 
can  readily  be  biased  so  that,  for  example,  a key  x with  relative  frequency  / 
will  appear  suitably  near  the  root  when  it  is  associated  with  y = U1^ , where 
U is  a random  number  between  0 and  1.  Treaps  performed  consistently  better 
than  splay  trees  in  some  experiments  conducted  by  D.  E.  Knuth  relating  to  the 
calculation  of  convex  hulls  [Lecture  Notes  in  Comp.  Sci.  606  (1992),  53-55], 

fA  new  Section  6.2.5  devoted  to  randomized  data  structures  is  planned  for 
the  next  edition  of  the  present  book.  It  will  discuss  “skip  lists”  [W.  Pugh, 
CACM  33  (1990),  668-676]  and  “randomized  binary  search  trees”  [S.  Roura  and 
C.  Martmez,  JACM  45  (1998),  288—323]  as  well  as  treaps. 
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EXERCISES 

1.  [01]  In  Case  2 of  (l),  why  isn’t  it  a good  idea  to  restore  the  balance  by  simply 
interchanging  the  left  subtrees  of  A and  B1 

2.  [16]  Explain  why  the  tree  has  gotten  one  level  higher  if  we  reach  step  A7  with 
B(S)  = 0. 

► 3.  [M25]  Prove  that  a balanced  tree  with  N internal  nodes  never  contains  more  than 
[4>  — 1 )N  ta  0.618037V  nodes  whose  balance  factor  is  nonzero. 

4.  [M22]  Prove  or  disprove:  Among  all  balanced  trees  with  Fh+ i — 1 internal  nodes, 
the  Fibonacci  tree  of  order  h has  the  greatest  internal  path  length. 

► 5.  [M25]  Prove  or  disprove:  If  Algorithm  A is  used  to  insert  the  keys  K2 , ■ ■ • , Km 
successively  in  increasing  order  into  a tree  that  initially  contains  only  the  single  key 
K 1,  where  K\  < K2  < • ■ ■ < Kn,  then  the  tree  produced  is  always  optimum  (that  is, 
it  has  minimum  internal  path  length  over  all  IV-node  binary  trees). 

6.  [M21]  Prove  that  Eq.  (5)  defines  the  generating  function  for  balanced  trees  of 
height  h. 

7.  [M27]  (A.  V.  Aho  and  N.  J.  A.  Sloane.)  Prove  the  remarkable  formula  (9)  for  the 
number  of  balanced  trees  of  height  h.  [Hint:  Let  Cn  = Bn  + Bn- 1,  and  use  the  fact 
that  \og(Cn+i / C'i)  is  exceedingly  small  for  large  n.] 

8.  [M24]  (L.  A.  Khizder.)  Show  that  there  is  a constant  (i  such  that  B’h(l) / Bh(l)  = 
2hp  - 1 + 0{2h/Bh-1)  as  h — > 00. 

9.  [HM44]  What  is  the  asymptotic  number  of  balanced  binary  trees  with  n internal 
nodes,  Ylh>o  Bnh-  What  is  the  asymptotic  average  height,  J2h>o  hBnh/ Z)h>0  Bnh7 

► 10.  [27]  (R.  C.  Richards.)  Show  that  the  shape  of  a balanced  tree  can  be  constructed 
uniquely  from  the  list  of  its  balance  factors  B(1)B(2)  . . . B IN)  in  symmetric  order. 

11.  [M24]  (Mark  R.  Brown.)  Prove  that  when  n > 6 the  average  number  of  external 
nodes  of  each  of  the  types  +A,  -A,  ++B,  +-B,  -+B,  — B is  exactly  (n  + 1) / 14,  in  a random 
balanced  tree  of  n internal  nodes  constructed  by  Algorithm  A. 

► 12.  [24]  What  is  the  maximum  possible  running  time  of  Program  A when  the  eighth 
node  is  inserted  into  a balanced  tree?  What  is  the  minimum  possible  running  time  for 
this  insertion? 

13.  [05]  Why  is  it  better  to  use  RANK  fields  as  defined  in  the  text,  instead  of  simply 
to  store  the  index  of  each  node  as  its  key  (calling  the  first  node  “1”,  the  second  node 
“2”,  and  so  on)? 

14.  [11]  Could  Algorithms  6.2.2T  and  6. 2. 2D  be  adapted  to  work  with  linear  lists, 
using  a RANK  field,  just  as  the  balanced  tree  algorithms  of  this  section  have  been  so 
adapted? 

15.  [18]  (C.  A.  Crane.)  Suppose  that  an  ordered  linear  list  is  being  represented  as 
a binary  tree,  with  both  KEY  and  RANK  fields  in  each  node.  Design  an  algorithm  that 
searches  the  tree  for  a given  key,  K,  and  determines  the  position  of  K in  the  list;  that  is, 
it  finds  the  number  m such  that  K is  the  mth  smallest  key. 

► 16.  [20]  Draw  the  balanced  tree  that  is  obtained  after  node  E and  the  root  node  F are 
deleted  from  Fig.  20,  using  the  deletion  algorithm  suggested  in  the  text. 

► 17.  [21]  Draw  the  balanced  trees  that  are  obtained  after  the  Fibonacci  tree  (12) 
is  concatenated  (a)  to  the  right,  (b)  to  the  left,  of  the  tree  in  Fig.  20,  using  the 
concatenation  algorithm  suggested  in  the  text. 
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18.  [22]  Draw  the  balanced  trees  that  are  obtained  after  Fig.  20  is  split  into  two  parts 
{A, . . . , 1}  and  {J, . . . , Q},  using  the  splitting  algorithm  suggested  in  the  text. 

► 19.  [26]  Find  a way  to  transform  a given  balanced  tree  so  that  the  balance  factor  at 
the  root  is  not  -1.  Your  transformation  should  preserve  the  symmetric  order  of  the 
nodes;  and  it  should  produce  another  balanced  tree  in  0(1)  units  of  time,  regardless  of 
the  size  of  the  original  tree. 

20.  [40]  Explore  the  idea  of  using  the  restricted  class  of  balanced  trees  whose  nodes 
all  have  balance  factors  of  0 or  +1.  (Then  the  length  of  the  B field  can  be  reduced  to 
one  bit.)  Is  there  a reasonably  efficient  insertion  procedure  for  such  trees? 

► 21.  [30]  ( Perfect  balancing .)  Design  an  algorithm  to  construct  iV-node  binary  trees 
that  are  optimum  in  the  sense  of  exercise  5.  Your  algorithm  should  use  O(N)  steps  and 
it  should  be  “online,”  in  the  sense  that  it  inputs  the  nodes  one  by  one  in  increasing  order 
and  builds  partial  trees  as  it  goes,  without  knowing  the  final  value  of  N in  advance.  (It 
would  be  appropriate  to  use  such  an  algorithm  when  restructuring  a badly  balanced 
tree,  or  when  merging  the  keys  of  two  trees  into  a single  tree.) 

22.  [M20]  What  is  the  analog  of  Theorem  A,  for  weight-balanced  trees? 

23.  [M20]  (E.  Reingold.)  Demonstrate  that  there  is  no  simple  relation  between 
height-balanced  trees  and  weight-balanced  trees: 

a)  Prove  that  there  exist  height-balanced  trees  that  have  an  arbitrarily  small  ratio 
(left  weight) /(right  weight)  in  the  sense  of  (17). 

b)  Prove  that  there  exist  weight- balanced  trees  that  have  an  arbitrarily  large  differ- 
ence between  left  and  right  subtree  heights. 

24.  [M22]  (E.  Reingold.)  Prove  that  if  we  strengthen  condition  (17)  to 

1 left  weight 

2 right  weight  ’ 

the  only  binary  trees  that  satisfy  this  condition  are  perfectly  balanced  trees  with  2n  - 1 
internal  nodes.  (In  such  trees,  the  left  and  right  weights  are  exactly  equal  at  all  nodes.) 

25.  [21]  (J.  Nievergelt,  E.  Reingold,  C.  Wong.)  Show  that  it  is  possible  to  design 
an  insertion  algorithm  for  weight-balanced  trees  so  that  condition  (17)  is  preserved, 
making  at  most  0(log  N)  rotations  per  insertion. 

26.  [40]  Explore  the  properties  of  balanced  t-ary  trees,  for  t > 2. 

► 27.  [M23]  Estimate  the  maximum  number  of  comparisons  needed  to  search  in  a 2-3 
tree  with  N internal  nodes. 

28.  [41]  Prepare  efficient  implementations  of  2-3  tree  algorithms. 

29.  [M47]  Analyze  the  average  behavior  of  2-3  trees  under  random  insertions. 

30.  [26]  (E.  McCreight.)  Section  2.5  discusses  several  strategies  for  dynamic  storage 
allocation,  including  best-fit  (choosing  an  available  area  as  small  as  possible  from  among 
all  those  that  fulfill  the  request)  and  first-fit  (choosing  the  available  area  with  lowest 
address  among  all  those  that  fulfill  the  request).  Show  that  if  the  available  space  is 
linked  together  as  a balanced  tree  in  an  appropriate  way,  it  is  possible  to  do  (a)  best-fit 
(b)  first-fit  allocation  in  only  O(logn)  units  of  time,  where  n is  the  number  of  available 
areas.  (The  algorithms  given  for  those  methods  in  Section  2.5  take  order  n steps.) 

31.  [34]  (M.  L.  Fredman,  1975.)  Invent  a representation  of  linear  lists  with  the 
property  that  insertion  of  a new  item  between  positions  m — 1 and  m,  given  m,  takes 
O(logm)  units  of  time. 
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32.  [M2 7]  Given  two  n-node  binary  trees,  T and  T' , let  us  say  that  TKT'  if  T'  can 
be  obtained  from  T by  a sequence  of  zero  or  more  rotations  to  the  right.  Prove  that 
T K T'  if  and  only  if  rk  < r'k  for  1 < k < n,  where  rk  and  r'k  denote  the  respective  sizes 
of  the  right  subtrees  of  the  fcth  nodes  of  T and  T'  in  symmetric  order. 

► 33.  [25]  (A.  L.  Buchsbaum.)  Explain  how  to  encode  the  balance  factors  of  an  AVL 
tree  implicitly,  thus  saving  two  bits  per  node,  at  the  expense  of  additional  work  when 
the  tree  is  accessed. 


Samuel  considered  the  nation  of  Israel,  tribe  by  tribe, 
and  the  tribe  of  Benjamin  was  picked  by  lot. 
Then  he  considered  the  tribe  of  Benjamin,  family  by  family, 
and  the  family  of  Matri  was  picked  by  lot. 
Then  he  considered  the  family  of  Matri,  man  by  man, 
and  Saul  son  of  Kish  was  picked  by  lot. 
But  when  they  looked  for  Saul  he  could  not  be  found. 

— 1 Samuel  10:20-21 


6.2.4.  Multiway  Trees 

The  tree  search  methods  we  have  been  discussing  were  developed  primarily  for 
internal  searching,  when  we  want  to  look  at  a table  that  is  contained  entirely 
within  a computer’s  high-speed  internal  memory.  Let’s  now  consider  the  problem 
of  external  searching,  when  we  want  to  retrieve  information  from  a very  large 
file  that  appears  on  direct  access  storage  units  such  as  disks  or  drums.  (An 
introduction  to  disks  and  drums  appears  in  Section  5.4.9.) 

Tree  structures  lend  themselves  nicely  to  external  searching,  if  we  choose 
an  appropriate  way  to  represent  the  tree.  Consider  the  large  binary  search 
tree  shown  in  Fig.  29,  and  imagine  that  it  has  been  stored  in  a disk  file.  (The 
LLINKs  and  RLINKs  of  the  tree  are  now  disk  addresses  instead  of  internal  memory 
addresses.)  If  we  search  this  tree  in  a naive  manner,  simply  applying  the 
algorithms  we  have  learned  for  internal  tree  searching,  we  will  have  to  make 
about  IgA'  disk  accesses  before  our  search  is  complete.  When  A is  a million, 
this  means  we  will  need  20  or  so  seeks.  But  suppose  we  divide  the  table  into 
7-node  “pages,”  as  shown  by  the  dotted  lines  in  Fig.  29;  if  we  access  one  page  at 
a time,  we  need  only  about  one  third  as  many  seeks,  so  the  search  goes  about 
three  times  as  fast! 

Grouping  the  nodes  into  pages  in  this  way  essentially  changes  the  tree  from 
a binary  tree  to  an  octonary  tree,  with  8- way  branching  at  each  page-node.  If 
we  let  the  pages  be  still  larger,  with  128-way  branching  after  each  disk  access, 
we  can  find  any  desired  key  in  a million-entry  table  after  looking  at  only  three 
pages.  We  can  keep  the  root  page  in  the  internal  memory  at  all  times,  so  that 
only  two  references  to  the  disk  are  required  even  though  the  internal  memory 
never  needs  to  hold  more  than  254  keys  at  any  time. 

Of  course  we  don’t  want  to  make  the  pages  arbitrarily  large,  since  the 
internal  memory  size  is  limited  and  also  since  it  takes  a long  time  to  read  a 
large  page.  For  example,  suppose  that  it  takes  72.5  + 0.05m  milliseconds  to  read 
a page  that  allows  m- way  branching.  The  internal  processing  time  per  page  will 
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Fig.  29.  A large  binary  search  tree  can  be  divided  into  “pages.” 

be  about  a + 61gm,  where  a is  small  compared  to  72.5  ms,  so  the  total  amount 
of  time  needed  for  searching  a large  table  is  approximately  proportional  to  lg  N 
times 

(72.5  + 0.05m)/ lgm  + b. 

This  quantity  achieves  a minimum  when  m ss  307;  actually  the  minimum  is 
very  “broad”  — a nearly  optimum  value  is  achieved  for  all  m between  200  and 
500.  In  practice  there  will  be  a similar  range  of  good  values  for  m,  based  on  the 
characteristics  of  particular  external  memory  devices  and  on  the  length  of  the 
records  in  the  table. 

W.  I.  Landauer  [ IEEE  Trans.  EC-12  (1963),  863-871]  suggested  building  an 
m- ary  tree  by  requiring  level  l to  become  nearly  full  before  anything  is  allowed 
to  appear  on  level  l + 1.  This  scheme  requires  a rather  complicated  rotation 
method,  since  we  may  have  to  make  major  changes  throughout  the  tree  just  to 
insert  a single  new  item;  Landauer  was  assuming  that  we  need  to  search  for  items 
in  the  tree  much  more  often  than  we  need  to  insert  or  delete  them. 

When  a file  is  stored  on  disk,  and  is  subject  to  comparatively  few  insertions 
and  deletions,  a three-level  tree  is  appropriate,  where  the  first  level  of  branching 
determines  what  cylinder  is  to  be  used,  the  second  level  of  branching  determines 
the  appropriate  track  on  that  cylinder,  and  the  third  level  contains  the  records 
themselves.  This  method  is  called  indexed-sequential  file  organization  [see  JACM 
16  (1969),  569-571], 

R.  Muntz  and  R.  Uzgalis  [Proc.  Princeton  Conf.  on  Inf.  Sciences  and  Systems 
4 (1970),  345-349]  suggested  modifying  the  tree  search  and  insertion  method, 
Algorithm  6.2.2T,  so  that  all  insertions  go  onto  nodes  belonging  to  the  same 
page  as  their  parent  node,  whenever  possible;  if  that  page  is  full,  a new  page 
is  started,  whenever  possible.  If  the  number  of  pages  is  unlimited,  and  if  the 
data  arrives  in  random  order,  it  can  be  shown  that  the  average  number  of  page 
accesses  is  approximately  fTy  / (Hrn  — 1) , only  slightly  more  than  we  would  obtain 
in  the  best  possible  m-ary  tree.  (See  exercise  8.) 

B- trees.  A new  approach  to  external  searching  by  means  of  multiway  tree 
branching  was  discovered  in  1970  by  R.  Bayer  and  E.  McCreight  [Acta  Informa- 
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tica  1 (1972),  173-189],  and  independently  at  about  the  same  time  by  M.  Kauf- 
man [unpublished].  Their  idea,  based  on  a versatile  new  kind  of  data  structure 
called  a B-tree,  makes  it  possible  both  to  search  and  to  update  a large  file  with 
guaranteed  efficiency,  in  the  worst  case,  using  comparatively  simple  algorithms. 

A B-tree  of  order  m is  a tree  that  satisfies  the  following  properties: 

i)  Every  node  has  at  most  m children. 

ii)  Every  node,  except  for  the  root  and  the  leaves,  has  at  least  m/2  children. 

iii)  The  root  has  at  least  2 children  (unless  it  is  a leaf). 

iv)  All  leaves  appear  on  the  same  level,  and  carry  no  information. 

v)  A nonleaf  node  with  k children  contains  k - 1 keys. 

(As  usual,  a “leaf”  is  a terminal  node,  one  with  no  children.  Since  the  leaves 
carry  no  information,  we  may  regard  them  as  external  nodes  that  aren’t  really 
in  the  tree,  so  that  A is  a pointer  to  a leaf.) 

Figure  30  shows  a B-tree  of  order  7.  Each  node  (except  for  the  root  and  the 
leaves)  has  between  [7/2]  and  7 children,  so  it  contains  3,  4,  5,  or  6 keys.  The 
root  node  is  allowed  to  contain  from  1 to  6 keys;  in  this  case  it  has  2.  All  of  the 
leaves  are  at  level  3.  Notice  that  (a)  the  keys  appear  in  increasing  order  from 
left  to  right,  using  a natural  extension  of  the  concept  of  symmetric  order;  and 
(b)  the  number  of  leaves  is  exactly  one  greater  than  the  number  of  keys. 

B-trees  of  order  1 or  2 are  obviously  uninteresting,  so  we  will  consider  only 
the  case  m > 3.  The  2-3  trees  defined  at  the  close  of  Section  6.2.3  are  equivalent 
to  B-trees  of  order  3.  (Bayer  and  McCreight  considered  only  the  case  that  m is 
odd;  some  authors  consider  a B-tree  of  order  m to  be  what  we  are  calling  a 
B-tree  of  order  2m  +1.) 

A node  that  contains  j keys  and  j + 1 pointers  can  be  represented  as 
, % 

(Po,KuPuK2,P2,...,Pj-1,Kj,Pj')  (l) 

V — $ — ir — $ ¥ 

where  Ki  < K-2  < ■ ■ ■ < Kj  and  Pt  points  to  the  subtree  for  keys  between 
Ki  and  Kl+i.  Therefore  searching  in  a B-tree  is  quite  straightforward:  After 
node  (l)  has  been  fetched  into  the  internal  memory,  we  search  for  the  given 
argument  among  the  keys  ATi,  A'2, . . . ,Kj.  (When  j is  large,  we  probably  do  a 
binary  search;  but  when  j is  smallish,  a sequential  search  is  best.)  If  the  search 
is  successful,  we  have  found  the  desired  key;  but  if  the  search  is  unsuccessful 
because  the  argument  lies  between  Ki  and  Kl+i,  we  fetch  the  node  indicated 
by  P i and  continue  the  process.  The  pointer  P0  is  used  if  the  argument  is  less 
than  Ki,  and  P^  is  used  if  the  argument  is  greater  than  Kj.  If  P,  = A,  the  search 
is  unsuccessful. 

The  nice  thing  about  B-trees  is  that  insertion  is  also  quite  simple.  Consider 
Fig.  30,  for  example;  every  leaf  corresponds  to  a place  where  a new  insertion 
might  happen.  If  we  want  to  insert  the  new  key  337,  we  simply  change  the 
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Fig.  30.  A B-tree  of  order  7,  with  all  leaves 
on  level  3.  Every  node  contains  3,  4,  5,  or  6 
keys.  The  leaf  that  precedes  key  449  has 
been  marked  A;  see  (8). 
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appropriate  node  from 


to 


□□□□□ 


b-COrH^t- 

m n co  nw 


□□□□□□ 


(2) 


On  the  other  hand,  if  we  want  to  insert  the  new  key  071,  there  is  no  room  since 
the  corresponding  node  on  level  2 is  already  “full.”  This  case  can  be  handled  by 
splitting  the  node  into  two  parts,  with  three  keys  in  each  part,  and  passing  the 
middle  key  up  to  level  1: 


rH  b-  oi  b-  t-*  C*5  CO 
Tf  Tf  »o  50  00 

O O O O O O O 


□c 

be 

be 

be 

be 

3C 

beb 

becomes 


(3) 


In  general,  if  we  want  to  insert  a new  item  into  a 5-tree  of  order  m,  when 
all  the  leaves  are  at  level  l,  we  insert  the  new  key  into  the  appropriate  node  on 
level  l — 1.  If  that  node  now  contains  m keys,  so  that  it  has  the  form  (1)  with 
j = m,  we  split  it  into  two  nodes 


jL 


7P0,  Al.Pl,  ■ • ■ J 

y ^ ^ N 


P' 


,K, 


i 


(4) 


and  insert  the  key  K^m/ 2]  into  the  parent  of  the  original  node.  (Thus  the  pointer 
P in  the  parent  node  is  replaced  by  the  sequence  P,  Kfm/ 2],  P'.)  This  insertion 
may  cause  the  parent  node  to  contain  m keys,  and  if  so,  it  should  be  split  in  the 
same  way.  (Figure  27  in  the  previous  section  illustrates  the  case  m = 3.)  If  we 
need  to  split  the  root  node,  which  has  no  parent,  we  simply  create  a new  root 
node  containing  the  single  key  K^m/ 2] ; the  tree  gets  one  level  taller  in  this  case. 

This  insertion  procedure  neatly  preserves  all  of  the  5-tree  properties;  in 
order  to  appreciate  the  full  beauty  of  the  idea,  the  reader  should  work  exercise  1. 
The  tree  essentially  grows  up  from  the  top,  instead  of  down  from  the  bottom, 
since  it  gains  in  height  only  when  the  root  splits. 

Deletion  from  5-trees  is  only  slightly  more  complicated  than  insertion  (see 
exercise  6). 


Upper  bounds  on  the  running  time.  Let  us  now  see  how  many  nodes  have 
to  be  accessed  in  the  worst  case,  while  searching  in  a 5-tree  of  order  m.  Suppose 
that  there  are  N keys,  and  that  the  N + 1 leaves  appear  on  level  l.  Then  the 
number  of  nodes  on  levels  1,  2, 3, . . . is  at  least  2,  2 [m/2] , 2 [m/2]2,  . . . ; hence 

N+l  > 2[m/2]'-1.  (5) 


In  other  words, 


Z < 1 + log|-m/2l  y— ) ; 


(6) 
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this  means,  for  example,  that  if  IV  = 1,999,998  and  m = 199,  then  l is  at  most  3. 
Since  we  need  to  access  at  most  l nodes  during  a search,  this  formula  guarantees 
that  the  running  time  is  quite  small. 

When  a new  key  is  being  inserted,  we  may  have  to  split  as  many  as  l nodes. 
However,  the  average  number  of  nodes  that  need  to  be  split  is  much  less,  since  the 
total  number  of  splittings  that  occur  while  the  entire  tree  is  being  constructed 
is  just  the  total  number  of  internal  nodes  in  the  tree,  minus  l.  If  there  are  p 
internal  nodes,  there  are  at  least  1 + ([m/2]  — l)(p  — 1)  keys;  hence 


p < 1 + 


TV-  1 

[m/2]  — 1 


(7) 


It  follows  that  {p  — l)/N,  the  average  number  of  times  we  need  to  split  a node 
while  building  a tree  of  N keys,  is  less  than  1/ ([m/2]  — l)  split  per  insertion. 


Refinements  and  variations.  There  are  several  ways  to  improve  upon  the 
basic  .B-tree  structure  defined  above,  by  breaking  the  rules  a little. 

In  the  first  place,  we  note  that  all  of  the  pointers  in  the  level  l — 1 nodes 
are  A,  and  none  of  the  pointers  in  the  other  levels  are  A.  This  often  represents  a 
significant  amount  of  wasted  space,  so  we  can  save  both  time  and  space  by  elim- 
inating all  the  A’s  and  using  a different  value  of  m for  all  of  the  “bottom”  nodes. 
This  use  of  two  different  m’s  does  not  foul  up  the  insertion  algorithm,  since  both 
halves  of  a node  that  is  being  split  remain  on  the  same  level  as  the  original 
node.  We  could  in  fact  define  a generalized  B-tree  of  orders  mj,  m2,  m3, ...  by 
requiring  all  nonroot  nodes  on  level  l — k to  have  between  mfc/2  and  m*,  children; 
such  a B-tree  has  different  m’s  on  each  level,  yet  the  insertion  algorithm  still 
works  essentially  as  before. 

To  carry  the  idea  in  the  preceding  paragraph  even  further,  we  might  use 
a completely  different  node  format  in  each  level  of  the  tree,  and  we  might  also 
store  information  in  the  leaves.  Sometimes  the  keys  form  only  a small  part  of 
the  records  in  a file,  and  in  such  cases  it  is  a mistake  to  store  the  entire  records 
in  the  branch  nodes  near  the  root  of  the  tree;  this  would  make  m too  small  for 
efficient  multiway  branching. 

We  can  therefore  reconsider  Fig.  30,  imagining  that  all  the  records  of  the 
file  are  now  stored  in  the  leaves,  and  that  only  a few  of  the  keys  have  been 
duplicated  in  the  branch  nodes.  Under  this  interpretation,  the  leftmost  leaf 
contains  all  records  whose  key  is  < Oil;  the  leaf  marked  A contains  all  records 
whose  key  satisfies 

439  < K < 449;  (8) 

and  so  on.  Under  this  interpretation  the  leaf  nodes  grow  and  split  just  as  the 
branch  nodes  do,  except  that  a record  is  never  passed  up  from  a leaf  to  the  next 
level.  Thus  the  leaves  are  always  at  least  half  filled  to  capacity.  A new  key 
enters  the  nonleaf  part  of  the  tree  whenever  a leaf  splits.  If  each  leaf  is  linked 
to  its  successor  in  symmetric  order,  we  gain  the  ability  to  traverse  the  file  both 
sequentially  and  randomly  in  an  efficient  and  convenient  manner.  This  variant 
has  become  known  as  a B+-tree. 
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Some  calculations  by  S.  P.  Ghosh  and  M.  E.  Senko  [JACM  16  (1969), 
569-579]  suggest  that  it  might  be  a good  idea  to  make  the  leaves  fairly  large, 
say  up  to  about  10  consecutive  pages  long.  By  linear  interpolation  in  the  known 
range  of  keys  for  each  leaf,  we  can  guess  which  of  the  10  pages  probably  contains 
a given  search  argument.  If  our  guess  is  wrong,  we  lose  time,  but  experiments 
indicate  that  this  loss  might  be  less  than  the  time  we  save  by  decreasing  the  size 
of  the  tree. 

T.  H.  Martin  [unpublished]  has  pointed  out  that  the  idea  underlying  B-trees 
can  be  used  also  for  variable-length  keys.  We  need  not  put  bounds  [m/2  . . m]  on 
the  number  of  children  of  each  node;  instead  we  can  say  merely  that  each  node 
should  be  at  least  about  half  full  of  data.  The  insertion  and  splitting  mechanism 
still  works  fine,  even  though  the  exact  number  of  keys  per  node  depends  on 
whether  the  keys  are  long  or  short.  However,  the  keys  shouldn’t  be  allowed  to 
get  extremely  long,  or  they  can  mess  things  up.  (See  exercise  5.) 

Another  important  modification  to  the  basic  B-tree  scheme  is  the  idea 
of  overflow  introduced  by  Bayer  and  McCreight.  The  idea  is  to  improve  the 
insertion  algorithm  by  resisting  its  temptation  to  split  nodes  so  often;  a local 
rotation  is  used  instead.  Suppose  we  have  a node  that  is  over-full  because  it 
contains  m keys  and  m + 1 pointers;  instead  of  splitting  it,  we  can  look  first 
at  its  sibling  node  on  the  right,  which  has  say  j keys  and  j + 1 pointers.  In 
the  parent  node  there  is  a key  Kf  that  separates  the  keys  of  the  two  siblings; 
schematically, 


If  j < m — 1,  a simple  rearrangement  makes  splitting  unnecessary:  We  leave 
l(m  + j)/ 2]  keys  in  the  left  node,  we  replace  Kf  by  ^[(m+j)/2 J+i  in  the  parent 
node,  and  we  put  the  [(m  + j)/ 2]  remaining  keys  (including  Kf  ) and  the 
corresponding  pointers  into  the  right  node.  Thus  the  full  node  “flows  over”  into 
its  sibling  node.  On  the  other  hand,  if  the  sibling  node  is  already  full  (j  = m — 1), 
we  can  split  both  of  the  nodes,  making  three  nodes  each  about  two-thirds  full, 
containing,  respectively,  [(2m  - 2)/3j,  [(2m  - l)/3j,  and  [2m/3j  keys: 


(10) 


If  the  original  node  has  no  right  sibling,  we  can  look  at  its  left  sibling  in  essentially 
the  same  way.  (If  the  original  node  has  both  a right  and  a left  sibling,  we  could 
even  refrain  from  splitting  off  a new  node  unless  both  left  and  right  siblings  are 
full.)  Finally  if  the  original  node  to  be  split  has  no  siblings  at  all,  it  must  be 
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the  root;  we  can  change  the  definition  of  B-tree,  allowing  the  root  to  contain  as 
many  as  2 [(2m  — 2)/3j  keys,  so  that  when  the  root  splits  it  produces  two  nodes 
of  [(2m  — 2)/3j  keys  each. 

The  effect  of  all  the  technicalities  in  the  preceding  paragraph  is  to  produce  a 
superior  breed  of  tree,  say  a B*- tree  of  order  m,  which  can  be  defined  as  follows: 

i)  Every  node  except  the  root  has  at  most  m children. 

ii)  Every  node,  except  for  the  root  and  the  leaves,  has  at  least  (2m  — 1) /3 

children. 

iii)  The  root  has  at  least  2 and  at  most  2 [(2m  - 2)/3j  + 1 children. 

iv)  All  leaves  appear  on  the  same  level. 

v)  A nonleaf  node  with  k children  contains  k — 1 keys. 

The  important  change  is  condition  (ii),  which  asserts  that  we  utilize  at  least 
two-thirds  of  the  available  space  in  every  node.  This  change  not  only  uses  space 
more  efficiently,  it  also  makes  the  search  process  faster,  since  we  may  replace 
\m/ 2]  by  [(2m  — l)/3]  in  (6)  and  (7).  However,  the  insertion  process  gets 
slower,  because  nodes  tend  to  need  more  attention  as  they  fill  up;  see  B.  Zhang 
and  M.  Hsu,  Acta  Informatica  26  (1989),  421-438,  for  an  approximate  analysis 
of  the  tradeoffs  involved. 

At  the  other  extreme,  it  is  sometimes  better  to  let  nodes  become  less  than 
half  full  in  a tree  that  changes  quite  frequently,  especially  if  insertions  tend 
to  outnumber  deletions.  This  situation  has  been  analyzed  by  T.  Johnson  and 
D.  Shasha,  J.  Comput.  Syst.  Sci.  47  (1993),  45-76. 

Perhaps  the  reader  has  been  skeptical  of  B-trees  because  the  degree  of  the 
root  can  be  as  low  as  2.  Why  should  we  waste  a whole  disk  access  on  merely 
a 2-way  decision?!  A simple  buffering  scheme,  called  least-recently-used  page 
replacement , overcomes  this  objection;  we  can  keep  several  bufferloads  of  infor- 
mation in  the  internal  memory,  so  that  input  commands  can  be  avoided  when 
the  corresponding  page  is  already  present.  Under  this  scheme,  the  algorithms 
for  searching  or  insertion  issue  “virtual  read”  commands  that  are  translated 
into  actual  input  instructions  only  when  the  necessary  page  is  not  in  memory; 
a subsequent  “release”  command  is  issued  when  the  buffer  has  been  read  and 
possibly  modified  by  the  algorithm.  When  an  actual  read  is  required,  the  buffer 
that  has  least  recently  been  released  is  chosen;  we  write  out  that  buffer,  if  its 
contents  have  changed  since  they  were  read  in,  then  we  read  the  desired  page 
into  the  chosen  buffer. 

Since  the  number  of  levels  in  the  tree  is  generally  small  compared  to  the 
number  of  buffers,  this  paging  scheme  will  ensure  that  the  root  page  is  always 
present  in  memory;  and  if  the  root  has  only  2 or  3 children,  the  first-level  pages 
will  almost  surely  stay  there  too.  Any  pages  that  might  need  to  be  split  during 
an  insertion  are  automatically  present  in  memory  when  they  are  needed,  because 
they  will  be  remembered  from  the  immediately  preceding  search. 

Experiments  by  E.  McCreight  have  shown  that  this  policy  is  quite  successful. 
For  example,  he  found  that  with  10  buffers  and  m = 121,  the  process  of  inserting 
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100,000  keys  in  ascending  order  required  only  22  actual  read  commands,  and  only 
857  actual  write  commands;  thus  most  of  the  activity  took  place  in  the  internal 
memory.  Furthermore  the  tree  contained  only  835  nodes,  just  one  higher  than 
the  minimum  possible  value  [ 100000/ (m  — 1)]  = 834;  thus  the  storage  utilization 
was  nearly  100  percent.  For  this  experiment  he  used  the  overflow  technique,  but 
with  only  2- way  node  splitting  as  in  (4),  not  3- way  splitting  as  in  (10).  (See 
exercise  3.) 

In  another  experiment,  again  with  10  buffers  and  m = 121  and  the  overflow 
technique,  he  inserted  5000  keys  into  an  initially  empty  tree,  in  random  order; 
this  produced  a 2-level  tree  with  48  nodes  (87  percent  storage  utilization),  after 
making  2762  actual  reads  and  2739  actual  writes.  Then  1000  random  searches 
required  786  actual  reads.  The  same  experiment  without  the  overflow  feature 
produced  a 2-level  tree  with  62  nodes  (67  percent  storage  utilization),  after 
making  2743  actual  reads  and  2800  actual  writes;  1000  subsequent  random 
searches  required  836  actual  reads.  This  shows  not  only  that  the  paging  scheme 
is  effective  but  also  that  it  is  wise  to  handle  overflows  locally  before  deciding  to 
split  a node. 

Andrew  Yao  has  proved  that  the  average  number  of  nodes  after  random 
insertions  without  the  overflow  feature  will  be 

N/  (m  In  2)  + 0{N/m 2), 

for  large  N and  to,  so  the  storage  utilization  will  be  approximately  In  2 = 69.3 
percent  [Acta  Informatica  9 (1978),  159-170].  See  also  the  more  detailed  analyses 
by  B.  Eisenbarth,  N.  Ziviani,  G.  H.  Gonnet,  K.  Mehlhorn,  and  D.  Wood,  Infor- 
mation and  Control  55  (1982),  125-174;  R.  A.  Baeza-Yates,  Acta  Informatica 
26  (1989),  439-471. 

.B-trees  became  popular  soon  after  they  were  invented.  See,  for  example, 
the  article  by  Douglas  Comer  in  Computing  Surveys  11  (1979),  121-137,  412, 
which  discusses  early  developments  and  describes  a widely  used  system  called 
VSAM  (Virtual  Storage  Access  Method)  developed  by  IBM  Corporation.  One  of 
the  innovations  of  VSAM  was  to  replicate  blocks  on  a disk  track  so  that  latency 
time  was  minimized. 

Two  of  the  most  interesting  developments  of  the  basic  B-tree  strategy  have 
unfortunately  been  given  almost  identical  names:  “SB-trees”  and  “SB-trees.” 
The  SB- tree  of  P.  E.  O’Neil  [Acta  Inf.  29  (1992),  241-265]  is  designed  to  min- 
imize disk  I/O  time  by  allocating  nearby  records  to  the  same  track  or  cylinder, 
maintaining  efficiency  in  applications  where  many  consecutive  records  need  to  be 
accessed  at  the  same  time;  in  this  case  “SB”  is  in  italic  type  and  the  S connotes 
“sequential.”  The  SB-tree  of  P.  Ferragina  and  R.  Grossi  [STOC  27  (1995),  693- 
702;  SODA  7 (1996),  373-382]  is  an  elegant  combination  of  B-tree  structure 
with  the  Patricia  trees  that  we  will  consider  in  Section  6.3;  in  this  case  “SB” 
is  in  roman  type  and  the  S connotes  “string.”  SB-trees  have  many  applications 
to  large-scale  text  processing,  and  they  provide  a basis  for  efficient  sorting  of 
variable-length  strings  on  disk  [see  Arge,  Ferragina,  Grossi,  and  Vitter,  STOC 
29  (1997),  540-548]. 
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EXERCISES 


1.  [10]  What  B- tree  of  order  7 is  obtained  after  the  key  613  is  inserted  into  Fig.  30? 
(Do  not  use  the  overflow  technique.) 

2.  [15]  Work  exercise  1,  but  use  the  overflow  technique,  with  3-way  splitting  as 
in  (10). 


► 3.  [23]  Suppose  we  insert  the  keys  1,  2,  3,  . . . in  ascending  order  into  an  initially 
empty  B-tree  of  order  101.  Which  key  causes  the  leaves  to  be  on  level  4 for  the  first  time 

a)  when  we  use  no  overflow? 

b)  when  we  use  overflow  and  only  2-way  splitting  as  in  (4)? 

c)  when  we  use  a B*-tree  of  order  101,  with  overflow  and  3-way  splitting  as  in  (10)? 

4.  [21]  (Bayer  and  McCreight.)  Explain  how  to  handle  insertions  into  a generalized 
.B-tree  so  that  all  nodes  except  the  root  and  leaves  will  be  guaranteed  to  have  at  least 
f m — \ children. 

► 5.  [21]  Suppose  that  a node  represents  1000  character  positions  of  external  memory. 
If  each  pointer  occupies  5 characters,  and  if  the  keys  are  variable  in  length,  between 
5 and  50  characters  long  but  always  a multiple  of  5 characters,  what  is  the  minimum 
number  of  character  positions  occupied  in  a node  after  it  splits  during  an  insertion? 
(Consider  only  a simple  splitting  procedure  analogous  to  that  described  in  the  text 
for  fixed-length-key  B-trees,  without  overflowing;  move  up  the  key  that  makes  the 
remaining  two  parts  most  nearly  equal  in  size.) 

6.  [23]  Design  a deletion  algorithm  for  B-trees. 

7.  [28]  Design  a concatenation  algorithm  for  B-trees  (see  Section  6.2.3). 

► 8.  [HM37]  Consider  the  generalization  of  tree  insertion  suggested  by  Muntz  and 
Uzgalis,  where  each  page  can  hold  M keys.  After  N random  items  have  been  inserted 
into  such  a tree,  so  that  there  are  N+l  external  nodes,  let  b(^k  be  the  probability  that 
an  unsuccessful  search  requires  k page  accesses  and  that  it  ends  at  an  external  node 
whose  parent  node  belongs  to  a page  containing  j keys.  If  (z)  = £6 ^\zk  is  the 
corresponding  generating  function,  prove  that  we  have  B[j)  (z)  = Snz  and 


B$(z)  = 


N-j-1 
N + l 


B(jl1(z)  + 


3 + 1 
N + l 


for  1 < j < M\ 


B 


(!) _ N - 2 n(i)  , s t 2 z 

N { N + 1Bn~^z)  + 'n  + 1 


C,W; 


N-ln 

N + 1*N 


(.M)  t~\  _l_  j£  + 1 rj(Af-l) 

IV  + 1 N~' 


-i(*)  + 


(*)• 


Find  the  asymptotic  behavior  of  C'N  — Yl’jLi  Bffi'(l),  the  average  number  of  page 
accesses  per  unsuccessful  search.  [Hint:  Express  the  recurrence  in  terms  of  the  matrix 


W(z)  = 


/~3 

3 
0 

0 
0 


0 

-4 

4 

0 

0 


0 

0 

0 

-Af— 1 
M+l 


and  relate  C'N  to  an  TVth  degree  polynomial  in  W(l).] 


2 z\ 
0 
0 

0 

-2/ 
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9.  [22]  Can  the  B-tree  idea  be  used  to  retrieve  items  of  a linear  list  by  position 
instead  of  by  key  value?  (See  Algorithm  6.2.3B.) 

► 10.  [35]  Discuss  how  a large  file,  organized  as  a B- tree,  can  be  used  for  concurrent 
accessing  and  updating  by  a large  number  of  simultaneous  users,  in  such  a way  that 
users  of  different  pages  rarely  interfere  with  each  other. 


Little  is  known,  even  for  otherwise  equivalent  algorithms, 
about  the  optimization  of  storage  allocation, 
minimization  of  the  number  of  required  operations, 
and  so  on.  This  area  of  investigation 
must  draw  upon  the  most  powerful  resources 
of  both  pure  and  applied  mathematics 
for  further  progress. 

— ANTHONY  G.  OETTINGER  (1961) 
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6.3.  DIGITAL  SEARCHING 

Instead  of  basing  a search  method  on  comparisons  between  keys,  we  can 
make  use  of  their  representation  as  a sequence  of  digits  or  alphabetic  characters. 
Consider,  for  example,  the  thumb  index  on  a large  dictionary;  from  the  first 
letter  of  a given  word,  we  can  immediately  locate  the  pages  that  contain  all 
words  beginning  with  that  letter. 

If  we  pursue  the  thumb-index  idea  to  one  of  its  logical  conclusions,  we  come 
up  with  a searching  scheme  based  on  repeated  “subscripting”  as  illustrated  in 
Table  1.  Suppose  that  we  want  to  test  a given  search  argument  to  see  whether  it  is 
one  of  the  31  most  common  words  of  English  (see  Figs.  12  and  13  in  Section  6.2.2). 
The  data  is  represented  in  Table  1 as  a trie  structure;  this  name  was  suggested 
by  E.  Fredkin  [CACM  3 (1960),  490-499]  because  it  is  a part  of  information 
retrieval.  A trie  — pronounced  “try”  — is  essentially  an  M- ary  tree,  whose  nodes 
are  M- place  vectors  with  components  corresponding  to  digits  or  characters.  Each 
node  on  level  l represents  the  set  of  all  keys  that  begin  with  a certain  sequence 
of  l characters  called  its  prefix;  the  node  specifies  an  M- way  branch,  depending 
on  the  (l  + l)st  character. 

For  example,  the  trie  of  Table  1 has  12  nodes;  node  (1)  is  the  root,  and  we 
look  up  the  first  letter  here.  If  the  first  letter  is,  say,  N,  the  table  tells  us  that  our 
word  must  be  NOT  (or  else  it  isn’t  in  the  table).  On  the  other  hand,  if  the  first 
letter  is  W,  node  (1)  tells  us  to  go  on  to  node  (9),  looking  up  the  second  letter 
in  the  same  way;  node  (9)  says  that  the  second  letter  should  be  A,  H,  or  I.  The 
prefix  of  node  (10)  is  HA.  Blank  entries  in  the  table  stand  for  null  links. 

The  node  vectors  in  Table  1 are  arranged  according  to  MIX  character  code. 
This  means  that  a trie  search  will  be  quite  fast,  since  we  are  merely  fetching 
words  of  an  array  by  using  the  characters  of  our  keys  as  subscripts.  Techniques 
for  making  quick  multiway  decisions  by  subscripting  have  been  called  “table 
look-at”  as  opposed  to  “table  look-up”  [see  P.  M.  Sherman,  CACM  4 (1961), 
172-173,  175], 

Algorithm  T ( Trie  search).  Given  a table  of  records  that  form  an  M- ary  trie, 
this  algorithm  searches  for  a given  argument  K.  The  nodes  of  the  trie  are  vectors 
whose  subscripts  run  from  0 to  M — 1;  each  component  of  these  vectors  is  either 
a key  or  a link  (possibly  null). 

Tl.  [Initialize.]  Set  the  link  variable  P so  that  it  points  to  the  root  of  the  trie. 

T2.  [Branch.]  Set  k to  the  next  character  of  the  input  argument,  K,  from  left  to 
right.  (If  the  argument  has  been  completely  scanned,  we  set  Ho  a “blank” 
or  end-of-word  symbol.  The  character  should  be  represented  as  a number 
in  the  range  0 < k < M.)  Let  X be  table  entry  number  k in  NODE(P) . If  X 
is  a link,  go  to  T3;  but  if  A is  a key,  go  to  T4. 

T3.  [Advance.]  If  X / A,  set  P <—  X and  return  to  step  T2;  otherwise  the 
algorithm  terminates  unsuccessfully. 

T4.  [Compare.]  If  X = K,  the  algorithm  terminates  successfully;  otherwise  it 
terminates  unsuccessfully.  | 
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Table  1 

A TRIE  FOR  THE  31  MOST  COMMON  ENGLISH  WORDS 


(i) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

(9) 

(10) 

(11) 

(12) 

A 
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HE 
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BUT 

1 

a 
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w 
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X 

' 1 1 ■ 

m 

BY 

3 

i 

Notice  that  if  the  search  is  unsuccessful,  the  longest  match  has  been  found. 
This  property  is  occasionally  useful  in  applications. 

In  order  to  compare  the  speed  of  this  algorithm  to  the  others  in  this  chapter, 
we  can  write  a short  MIX  program  assuming  that  the  characters  are  bytes  and 
that  the  keys  are  at  most  five  bytes  long. 

Program  T ( Trie  search).  This  program  assumes  that  all  keys  are  represented  in 
one  MIX  word,  with  blank  spaces  at  the  right  whenever  the  key  has  less  than  five 
characters.  Since  we  use  MIX  character  code,  each  byte  of  the  search  argument 
is  assumed  to  contain  a number  less  than  30.  Links  are  represented  as  negative 
numbers  in  the  0 : 2 field  of  a node  word,  rll  = P,  rX  = unscanned  part  of  K. 


01 

START 

LDX 

K 

1 

Tl.  Initialize. 

02 

ENT1 

ROOT 

1 

P <—  pointer  to  root  of  trie. 

03 

2H 

SLAX 

1 

C 

T2.  Branch. 

04 

STA 

*+1(2:2) 

C 

Extract  next  character,  k. 

05 

ENT2 

0,1 

C 

Q +-P  + k. 

06 

LD1N 

0, 2(0:2) 

C 

P = LINK(Q) . 

01 

J1P 

2B 

C 

T3.  Advance.  To  T2  if  P is  a link  ^ A. 
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Fig.  31.  The  trie  of  Table  1, 
converted  into  a “forest.” 
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08 

LDA 

0,2 

09 

CMPA 

K 

10 

JE 

SUCCESS 

11 

FAILURE  EQU 

* 

T4.  Compare.  rA  <—  KEY(Q) 

Exit  successfully  if  rA  = K. 
Exit  if  not  in  the  trie.  | 


The  running  time  of  this  program  is  8C  + 8 units,  where  C is  the  number  of  char- 
acters examined.  Since  C < 5,  the  search  never  needs  more  than  48  units  of  time. 

If  we  now  compare  the  efficiency  of  this  program  (using  the  trie  of  Table  1) 
to  Program  6.2.2T  (using  the  optimum  binary  search  tree  of  Fig.  13),  we  can 
make  the  following  observations. 

1.  The  trie  takes  much  more  memory  space;  we  are  using  360  words  just  to 
represent  31  keys,  while  the  binary  search  tree  uses  only  62  words  of  memory. 
(However,  exercise  4 shows  that,  with  some  fiddling  around,  we  can  actually  fit 
the  trie  of  Table  1 into  only  49  words.) 

2.  A successful  search  takes  about  26  units  of  time  for  both  programs.  But 
an  unsuccessful  search  will  go  faster  in  the  trie,  slower  in  the  binary  search  tree. 
For  this  data  the  search  will  be  unsuccessful  more  often  than  it  is  successful,  so 
the  trie  is  preferable  from  the  standpoint  of  speed. 

3.  If  we  consider  the  KWIC  indexing  application  of  Fig.  15  instead  of  the 
31  commonest  English  words,  the  trie  loses  its  advantage  because  of  the  nature 
of  the  data.  For  example,  a trie  requires  12  iterations  to  distinguish  between 
COMPUTATION  and  COMPUTATIONS.  In  this  case  it  would  be  better  to  build  the 
trie  so  that  words  are  scanned  from  right  to  left  instead  of  from  left  to  right. 

The  abstract  concept  of  a trie  to  represent  a family  of  strings  was  introduced 
by  Axel  Thue,  in  a paper  about  strings  that  do  not  contain  adjacent  repeated 
substrings  [Skrifter  udgivne  a f Videnskabs-Selskabet  i Christiania,  Mathematisk- 
N aturvidenskabelig  Klasse  (1912),  No.  1;  reprinted  in  Thue’s  Selected  Mathemat- 
ical Papers  (Oslo:  Universitetsforlaget,  1977),  413-477], 

Trie  memory  for  computer  searching  was  first  recommended  by  Rene  de  la 
Briandais  [Proc.  Western  Joint  Computer  Conf.  15  (1959),  295-298].  He  pointed 
out  that  we  can  save  memory  space  at  the  expense  of  running  time  if  we  use  a 
linked  list  for  each  node  vector,  since  most  of  the  entries  in  the  vectors  tend  to 
be  empty.  In  effect,  this  idea  amounts  to  replacing  the  trie  of  Table  1 by  the 
forest  of  trees  shown  in  Fig.  31.  Searching  in  such  a forest  proceeds  by  finding 
the  root  that  matches  the  first  character,  then  finding  the  child  node  of  that  root 
that  matches  the  second  character,  etc. 
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In  his  article,  de  la  Briandais  did  not  actually  stop  the  tree  branching  exactly 
as  shown  in  Table  1 or  Fig.  31;  instead,  he  continued  to  represent  each  key, 
character  by  character,  until  reaching  the  end-of-word  delimiter.  Thus  he  would 
actually  have  used 


« 


in  place  of  the  “H”  tree  in  Fig.  31.  This  representation  requires  more  storage, 
but  it  makes  the  processing  of  variable-length  data  especially  easy.  If  we  use  two 
link  fields  per  character,  dynamic  insertions  and  deletions  can  be  handled  in  a 
simple  manner. 

If  we  use  the  normal  way  of  representing  trees  as  binary  trees,  (l)  becomes 
the  binary  tree 


(2) 


(In  the  representation  of  the  full  forest,  Fig.  31,  we  would  also  have  a pointer 
leading  to  the  right  from  H to  its  neighboring  root  I.)  The  search  in  this  binary 
tree  proceeds  by  comparing  a character  of  the  argument  to  the  character  in  the 
tree,  and  following  RLINKs  until  finding  a match;  then  the  LLINK  is  taken  and 
we  treat  the  next  character  of  the  argument  in  the  same  way. 

With  such  a binary  tree,  we  are  more  or  less  doing  a search  by  comparison, 
with  equal-unequal  branching  instead  of  less-greater  branching.  The  elementary 
theory  of  Section  6.2.1  tells  that  we  must  make  at  least  lg  N comparisons,  on 
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the  average,  to  distinguish  between  N keys;  the  average  number  of  tests  made 
when  searching  a tree  like  that  of  Fig.  31  must  be  at  least  as  many  as  we  make 
when  doing  a binary  search  using  the  techniques  of  Section  6.2. 

On  the  other  hand,  the  trie  in  Table  1 is  capable  of  making  an  M- way  branch 
all  at  once;  we  shall  see  that  the  average  search  time  for  large  N involves  only 
about 

log  mn  = lg  N/  lg  M 

iterations,  if  the  input  data  is  random.  We  shall  also  see  that  a “pure”  trie 
scheme  like  that  in  Algorithm  T requires  a total  of  approximately  N / In  M nodes 
to  distinguish  between  N random  inputs;  hence  the  total  amount  of  space  is 
proportional  to  MN/lnM. 

From  these  considerations  it  is  clear  that  the  trie  idea  pays  off  only  in 
the  first  few  levels  of  the  tree.  We  can  get  better  performance  by  mixing  two 
strategies,  using  a trie  for  the  first  few  characters  and  then  switching  to  some 
other  technique.  For  example,  E.  H.  Sussenguth,  Jr.  [CACM  6 (1963),  272- 
279]  suggested  using  a character-by-character  scheme  until  we  reach  part  of  the 
tree  where  only,  say,  six  or  fewer  keys  of  the  file  are  possible,  and  then  we  can 
sequentially  run  through  the  short  list  of  remaining  keys.  We  shall  see  that  this 
mixed  strategy  decreases  the  number  of  trie  nodes  by  roughly  a factor  of  six, 
without  substantially  changing  the  running  time. 

An  interesting  way  to  store  large,  growing  tries  in  external  memory  was 
suggested  by  S.  Y.  Berkovich  in  Doklady  Akademii  Nauk  SSSR  202  (1972), 
298-299  [English  translation  in  Soviet  Physics-Doklady  17  (1972),  20-21], 

T.  N.  Turba  [CACM  25  (1982),  522-526]  points  out  that  it  is  sometimes 
most  convenient  to  search  for  variable-length  keys  by  having  one  search  tree  or 
trie  for  each  different  length. 

The  binary  case.  Let  us  now  consider  the  special  case  M = 2,  in  which  we 
scan  the  search  argument  one  bit  at  a time.  Two  interesting  methods  have  been 
developed  that  are  especially  appropriate  for  this  case. 

The  first  method,  which  we  call  digital  tree  search , is  due  to  E.  G.  Coffman 
and  J.  Eve  [ CACM  13  (1970),  427-432,  436].  The  idea  is  to  store  full  keys 
in  the  nodes  just  as  we  did  in  the  tree  search  algorithm  of  Section  6.2.2,  but 
to  use  bits  of  the  argument  (instead  of  results  of  the  comparisons)  to  govern 
whether  to  take  the  left  or  right  branch  at  each  step.  Figure  32  shows  the  binary 
tree  constructed  by  this  method  when  we  insert  the  31  most  common  English 
words  in  order  of  decreasing  frequency.  In  order  to  provide  binary  data  for  this 
illustration,  the  words  have  been  expressed  in  MIX  character  code,  and  the  codes 
have  been  converted  into  binary  numbers  with  5 bits  per  byte.  Thus,  the  word 
WHICH  is  represented  as  the  bit  sequence  1101001000010010001101000. 

To  search  for  this  word  WHICH  in  Fig.  32,  we  compare  it  first  with  the  word 
THE  at  the  root  of  the  tree.  Since  there  is  no  match  and  since  the  first  bit  of 
WHICH  is  1,  we  move  to  the  right  and  compare  with  OF.  Since  there  is  no  match 
and  since  the  second  bit  of  WHICH  is  1,  we  move  to  the  right  and  compare  with 
WITH;  and  so  on.  Alphabetic  order  of  the  keys  in  a digital  search  tree  no  longer 
corresponds  to  symmetric  order  of  the  nodes. 
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Fig.  32.  A digital  search  tree  for  the  31  most  common  English  words,  inserted  in 
decreasing  order  of  frequency. 


It  is  interesting  to  note  the  contrast  between  Fig.  32  and  Fig.  12  in  Section 
6.2.2,  since  the  latter  tree  was  formed  in  the  same  way  but  using  comparisons 
instead  of  key  bits  for  the  branching.  If  we  consider  the  given  frequencies, 
the  digital  search  tree  of  Fig.  32  requires  an  average  of  3.42  comparisons  per 
successful  search;  this  is  somewhat  better  than  the  4.04  comparisons  needed  by 
Fig.  12,  although  of  course  the  computing  time  per  comparison  will  probably  be 
different. 

Algorithm  D ( Digital  tree  search  and  insertion).  Given  a table  of  records 
that  form  a binary  tree  as  described  above,  this  algorithm  searches  for  a given 
argument  K.  If  K is  not  in  the  table,  a new  node  containing  K is  inserted  into 
the  tree  in  the  appropriate  place. 

This  algorithm  assumes  that  the  tree  is  nonempty  and  that  its  nodes  have 
KEY,  LLINK,  and  RLINK  fields  just  as  in  Algorithm  6.2.2T.  In  fact,  the  two 
algorithms  are  almost  identical,  as  the  reader  may  verify. 

Dl.  [Initialize.]  Set  P 4—  ROOT,  and  K'  4—  K. 

D2.  [Compare.]  If  K — KEY(P),  the  search  terminates  successfully.  Otherwise 
set  b to  the  leading  bit  of  K' , and  shift  K'  left  one  place  (thereby  removing 
that  bit  and  introducing  a 0 at  the  right).  If  b — 0,  go  to  D3,  otherwise  go 
to  D4. 

D3.  [Move  left.]  If  LLINK (P)  / A,  set  P 4-  LLINK  (P)  and  go  back  to  D2. 
Otherwise  go  to  D5. 

D4.  [Move  right.]  If  RLINK  (P)  / A,  set  P 4—  RLINK  (P)  and  go  back  to  D2. 

D5.  [Insert  into  tree.]  Set  Q «=  AVAIL,  KEY(Q)  4-  K,  LLINK (Q)  4-  RLINK(Q)  4-  A. 
If  b = 0 set  LLINK (P)  4—  Q,  otherwise  set  RLINK (P)  4-  Q.  | 
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Although  the  tree  search  of  Algorithm  6.2.2T  is  inherently  binary,  it  is  not 
difficult  to  see  that  the  present  algorithm  could  be  extended  to  an  M- ary  digital 
search  for  any  M >2  (see  exercise  13). 

Donald  R.  Morrison  [JACM  15  (1968),  514—534]  has  discovered  a very  pretty 
way  to  form  N- node  search  trees  based  on  the  binary  representation  of  keys, 
without  storing  keys  in  the  nodes.  His  method,  called  “Patricia”  (Practical 
Algorithm  To  Retrieve  Information  Coded  In  Alphanumeric),  is  especially  suit- 
able for  dealing  with  extremely  long,  variable-length  keys  such  as  titles  or  phrases 
stored  within  a large  bulk  file.  A closely  related  algorithm  was  published  at 
almost  exactly  the  same  time  in  Germany  by  G.  Gwehenberger,  Elektronische 
Rechenanlagen  10  (1968),  223-226. 

Patricia’s  basic  idea  is  to  build  a binary  trie,  but  to  avoid  one-way  branching 
by  including  in  each  node  the  number  of  bits  to  skip  over  before  making  the  next 
test.  There  are  several  ways  to  exploit  this  idea;  perhaps  the  simplest  to  explain 
is  illustrated  in  Fig.  33.  We  have  a TEXT  array  of  bits,  which  is  usually  quite 
long;  it  may  be  stored  as  an  external  direct-access  file,  since  each  search  accesses 
TEXT  only  once.  Each  key  to  be  stored  in  our  table  is  specified  by  a starting 
place  in  the  text,  and  it  can  be  imagined  to  go  from  this  starting  place  all  the 
way  to  the  end  of  the  text.  (Patricia  does  not  search  for  strict  equality  between 
key  and  argument;  instead,  it  will  determine  whether  or  not  there  exists  a key 
beginning  with  the  argument.) 

thisuisutheuhouseuthatujackubuilt? 

101110100001001 101100000001001 10110000001011101000001010000001000 1000011000101100010100000 101110100000001 1011100000010110000100011011000000000010110000100101101 1011111111 


Header 


Fig.  33.  An  example  of  Patricia’s  tree  and  TEXT. 


The  situation  depicted  in  Fig.  33  involves  seven  keys,  one  starting  at  each 
word,  namely  “THIS  IS  THE  HOUSE  THAT  JACK  BUILT?”  and  “IS  THE  HOUSE  THAT 
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JACK  BUILT?”  and  . . . and  “BUILT?”.  There  is  one  important  restriction,  namely 
that  no  one  key  may  be  a prefix  of  another;  this  restriction  can  be  met  if  we  end 
the  text  with  a unique  end-of-text  code  (in  this  case  “?” ) that  appears  nowhere 
else.  The  same  restriction  was  implicit  in  the  trie  scheme  of  Algorithm  T,  where 
“u”  was  the  termination  code. 

The  tree  that  Patricia  uses  for  searching  should  be  contained  in  random- 
access  memory,  or  it  should  be  arranged  on  pages  as  suggested  in  Section  6.2.4. 
It  consists  of  a header  and  N — 1 nodes,  where  the  nodes  contain  several  fields: 

KEY,  a pointer  to  the  text.  This  field  must  be  at  least  lg  C bits  long,  if  the 
text  contains  C characters.  In  Fig.  33  the  words  shown  within  each  node 
would  really  be  represented  by  pointers  to  the  text;  for  example,  instead 
of  “(JACK)”  the  node  would  contain  the  number  24  (which  indicates  the 
starting  place  of  “JACK  BUILT?”  in  the  text  string). 

LLINK  and  RLINK,  pointers  within  the  tree.  These  fields  must  be  at  least 
lg  N bits  long. 

LTAG  and  RTAG,  one-bit  fields  that  tell  whether  or  not  LLINK  and  RLINK, 
respectively,  are  pointers  to  children  or  to  ancestors  of  the  node.  The 
dotted  lines  in  Fig.  33  correspond  to  pointers  whose  TAG  bit  is  1. 

SKIP,  a number  that  tells  how  many  bits  to  skip  when  searching,  as  explained 
below.  This  field  should  be  large  enough  to  hold  the  largest  number  k 
such  that  all  keys  with  prefix  a agree  in  the  next  k bits  following  a,  for 
some  string  a that  is  a prefix  of  at  least  two  different  keys;  in  practice, 
we  may  usually  assume  that  k isn’t  too  large,  and  an  error  indication 
can  be  given  if  the  size  of  the  SKIP  field  is  exceeded.  The  SKIP  fields 
are  shown  as  numbers  within  each  non-header  node  of  Fig.  33. 

The  header  contains  only  KEY,  LLINK,  and  LTAG  fields. 

A search  in  Patricia’s  tree  is  carried  out  as  follows:  Suppose  we  are  looking 
up  the  word  THE  (bit  pattern  10111  01000  00101).  We  start  by  looking  at  the 
SKIP  field  of  the  root  node  a,  which  tells  us  to  examine  bit  1 of  the  argument. 
That  bit  is  1,  so  we  move  to  the  right.  The  SKIP  field  in  the  next  node,  7,  tells 
us  to  look  at  the  1 + 11  = 12th  bit  of  the  argument.  It  is  0,  so  we  move  to  the 
left.  The  SKIP  field  of  the  next  node,  e,  tells  us  to  look  at  the  (12  + l)st  bit, 
which  is  1;  now  we  find  RTAG  = 1,  so  we  go  back  to  node  7,  which  refers  us  to 
the  TEXT.  The  search  path  we  have  taken  would  occur  for  any  argument  whose 
bit  pattern  is  lxxxx  xxxxx  xOl. . . , and  we  must  check  to  see  if  it  matches  the 
unique  key  beginning  with  that  pattern,  namely  THE. 

Suppose,  on  the  other  hand,  that  we  are  looking  for  any  or  all  keys  starting 
with  TH.  The  search  process  begins  as  above,  but  it  eventually  tries  to  look  at 
the  (nonexistent)  12th  bit  of  the  10-bit  argument.  At  this  point  we  compare  the 
argument  to  the  TEXT  at  the  point  specified  in  the  current  node  (in  this  case 
node  7).  If  it  does  not  match,  the  argument  is  not  the  beginning  of  any  key; 
but  if  it  does  match,  the  argument  is  the  beginning  of  every  key  represented  by 
dotted  links  in  node  7 and  its  descendants  (namely  THIS,  THAT,  THE). 
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The  search  process  can  be  spelled  out  more  precisely  in  the  following  way. 

Algorithm  P (Patricia).  Given  a TEXT  array  and  a tree  with  KEY,  LLINK,  RLINK, 
LTAG,  RTAG,  and  SKIP  fields,  as  described  above,  this  algorithm  determines 
whether  or  not  there  is  a key  in  the  TEXT  that  begins  with  a specified  argument  K. 
(If  r such  keys  exist,  for  r > 1,  it  is  subsequently  possible  to  locate  them  all  in 
0(r)  steps;  see  exercise  14.)  We  assume  that  at  least  one  key  is  present. 

PI.  [Initialize.]  Set  P «—  HEAD  and  j t—  0.  (Variable  P is  a pointer  that  will  move 
down  the  tree,  and  j is  a counter  that  will  designate  bit  positions  of  the 
argument.)  Set  n <—  number  of  bits  in  K. 

P2.  [Move  left.]  Set  Q i~  P and  P «-  LLINK(Q) . If  LTAG(Q)  = 1,  go  to  P6. 

P3.  [Skip  bits.]  (At  this  point  we  know  that  if  the  first  j bits  of  K match  any  key 
whatsoever,  they  match  the  key  that  starts  at  KEY  (P) .)  Set  j «—  j+SKIP(P) . 
If  j > n,  go  to  P6. 

P4.  [Test  bit.]  (At  this  point  we  know  that  if  the  first  j — l bits  of  K match  any 
key,  they  match  the  key  starting  at  KEY  (P) .)  If  the  jth  bit  of  K is  0,  go  to 
P2,  otherwise  go  to  P5. 

P5.  [Move  right.]  Set  Q <-  P and  P <-  RLINK (Q) . If  RTAG(Q)  = 0,  go  to  P3. 

P6.  [Compare.]  (At  this  point  we  know  that  if  K matches  any  key,  it  matches 
the  key  starting  at  KEY(P).)  Compare  K to  the  key  that  starts  at  position 
KEY(P)  in  the  TEXT  array.  If  they  are  equal  (up  to  n bits,  the  length  of  K), 
the  algorithm  terminates  successfully;  if  unequal,  it  terminates  unsuccess- 
fully. I 

Exercise  15  shows  how  Patricia’s  tree  can  be  built  in  the  first  place.  We  can 
also  add  to  the  text  and  insert  new  keys,  provided  that  the  new  text  material 
always  ends  with  a unique  delimiter  (for  example,  an  end-of-text  symbol  followed 
by  a serial  number). 

Patricia  is  a little  tricky,  and  she  requires  careful  scrutiny  before  all  of  her 
beauties  are  revealed. 

Analyses  of  the  algorithms.  We  shall  conclude  this  section  by  making  a 
mathematical  study  of  tries,  digital  search  trees,  and  Patricia.  A summary  of 
the  main  consequences  of  these  analyses  appears  at  the  very  end. 

Let  us  consider  first  the  case  of  binary  tries,  namely  tries  with  M = 2. 
Figure  34  shows  the  binary  trie  that  is  formed  when  the  sixteen  keys  from  the 
sorting  examples  of  Chapter  5 are  treated  as  10-bit  binary  numbers.  (The  keys 
are  shown  in  octal  notation,  so  that  for  example  1144  represents  the  10-bit 
number  612  = (1001100100)2.)  As  in  Algorithm  T,  we  use  the  trie  to  store 
information  about  the  leading  bits  of  the  keys  until  we  get  to  the  first  point 
where  the  key  is  uniquely  identified;  then  the  key  is  recorded  in  full. 

If  Fig.  34  is  compared  to  Table  5. 2. 2-3,  an  amazing  relationship  between 
trie  memory  and  radix  exchange  sorting  is  revealed.  (Then  again,  perhaps  this 
relationship  is  obvious.)  The  22  nodes  of  Fig.  34  correspond  precisely  to  the  22 
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Fig.  34.  Example  of  a random  binary  trie. 


partitioning  stages  in  Table  5. 2. 2-3,  with  the  pth  node  in  preorder  corresponding 
to  Stage  p.  The  number  of  bit  inspections  in  a partitioning  stage  is  equal  to  the 
number  of  keys  within  the  corresponding  node  and  its  subtries;  consequently  we 
may  state  the  following  result. 

Theorem  T.  If  N distinct  binary  numbers  are  put  into  a binary  trie  as  described 
above,  then  (i)  the  number  of  nodes  of  the  trie  is  equal  to  the  number  of 
partitioning  stages  required  if  these  numbers  are  sorted  by  radix  exchange;  and 
(ii)  the  average  number  of  bit  inspections  required  to  retrieve  a key  by  means  of 
Algorithm  T is  1/N  times  the  number  of  bit  inspections  required  by  the  radix 
exchange  sort.  | 

Because  of  this  theorem,  we  can  make  use  of  all  the  mathematical  machinery 
that  was  developed  for  radix  exchange  in  Section  5.2.2.  For  example,  if  we 
assume  that  our  keys  are  infinite-precision  random  uniformly  distributed  real 
numbers  between  0 and  1,  the  number  of  bit  inspections  needed  for  retrieval  will 
be  lg N T y/ln2  + 1/2  + 6(N)  + 0(N~1),  and  the  number  of  trie  nodes  will  be 
jV/ln 2 + NS(N)  + 0(1).  Here  S(N)  and  S(N)  are  complicated  functions  that 
may  be  neglected  since  their  absolute  value  is  always  less  than  10-6  (see  exercises 
5.2.2-38  and  5.2.2-48). 

Of  course  there  is  still  more  work  to  be  done,  since  we  need  to  generalize 
from  binary  tries  to  M- ary  tries.  We  shall  describe  only  the  starting  point  of 
the  investigations  here,  leaving  the  instructive  details  as  exercises. 

Let  An  be  the  average  number  of  internal  nodes  in  a random  M- ary  search 
trie  that  contains  N keys.  Then  ^40  — Ai  = 0,  and  for  N >2  we  have 

^ = 1 + E (k,Nlk  -}M-N)(Akl  + ...  + AkM),  (3) 

k1+-+kM=N  / 
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since  TV!  M N/k\\ . . . /cm!  is  the  probability  that  k\  of  the  keys  are  in  the  first 
subtrie,  . . . , /cm  in  the  M th.  This  equation  can  be  rewritten 


an  = \ + m1-n  ]T 

feiH h fcjvf =N 

'N 


N\ 


Akl 


k\\ . . . /cm!, 

1 + ~l)N~kAk,  for  N >2, 


(4) 


by  using  symmetry  and  then  summing  over  Zc2,  ■ • • , kM.  Similarly,  if  CJN  denotes 
the  average  total  number  of  digit  inspections  needed  to  look  up  all  N keys  in  the 
trie,  we  find  Cq  = Ci  = 0 and 


CN  = N + M1~nJ2  (^{M -l)N~kCk  for  N >2.  (5) 


Exercise  17  shows  how  to  deal  with  general  recurrences  of  this  type,  and  exercises 
18-25  work  out  the  corresponding  theory  of  random  tries.  [The  analysis  of  An 
was  first  approached  from  another  point  of  view  by  L.  R.  Johnson  and  M.  H. 
McAndrew,  IBM  J.  Res.  and  Devel.  8 (1964),  189-193,  in  connection  with  an 
equivalent  hardware-oriented  sorting  algorithm.] 

If  we  now  turn  to  a study  of  digital  search  trees,  we  find  that  the  formulas 
are  similar,  yet  different  enough  that  it  is  not  easy  to  see  how  to  deduce  the 
asymptotic  behavior.  For  example,  if  CN  denotes  the  average  total  number  of 
digit  inspections  made  when  looking  up  all  N keys  in  an  M- ary  digital  search 
tree,  it  is  not  difficult  to  deduce  as  above  that  C0  = C\  = 0,  and 


CN+1  =N  + M1~n  (Nk)  (M  ~ l)JV_fc  Ck 


for  N > 0. 


(6) 


This  is  almost  identical  to  Eq.  (5);  but  the  appearance  of  IV  + 1 instead  of  N 
on  the  left-hand  side  of  this  equation  is  enough  to  change  the  entire  character  of 
the  recurrence,  so  the  methods  we  have  used  to  study  (5)  are  wiped  out. 

Let’s  consider  the  binary  case  first.  Figure  35  shows  the  digital  search  tree 
corresponding  to  the  sixteen  example  keys  of  Fig.  34,  when  they  have  been 
inserted  in  the  order  used  in  the  examples  of  Chapter  5.  If  we  want  to  determine 
the  average  number  of  bit  inspections  made  in  a random  successful  search,  this 
is  just  the  internal  path  length  of  the  tree  divided  by  N,  since  we  need  l bit 
inspections  to  find  a node  on  level  l.  Notice,  however,  that  the  average  number 
of  bit  inspections  made  in  a random  unsuccessful  search  is  not  simply  related  to 
the  external  path  length  of  the  tree,  since  unsuccessful  searches  are  more  likely 
to  occur  at  external  nodes  near  the  root;  thus,  the  probability  of  reaching  the  left 
sub- branch  of  node  0075  in  Fig.  35  is  | (assuming  infinitely  precise  keys),  and 
the  left  sub-branch  of  node  0232  will  be  encountered  with  probability  only  ~ . 
For  this  reason,  digital  search  trees  tend  to  stay  better  balanced  than  the  binary 
search  trees  of  Algorithm  6.2.2T,  when  the  keys  are  uniformly  distributed. 
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We  can  use  a generating  function  to  describe  the  pertinent  characteristics 
of  a digital  search  tree.  If  there  are  a;  internal  nodes  on  level  l,  consider 
the  generating  function  a(z ) = "fli  aiz1;  for  example,  the  generating  function 
corresponding  to  Fig.  35  is  a(z ) = 1 + 2 z + 4 z1  + 5 z3  + 4 z4.  If  there  are  6; 
external  nodes  on  level  Z,  and  if  b(z)  = biz1,  we  have 

b{z)  = 1 + (2 z - l)o(z)  (7) 

by  exercise  6.2.1-25.  For  example,  1 + (2 2 — 1)(1  + 2 2 + 422  + 523  + 424)  = 
323  + 624  + 82s.  The  average  number  of  bit  inspections  made  in  a random 
successful  search  is  a/(l)/a(l),  since  a'(l)  is  the  internal  path  length  of  the  tree 
and  a(l)  is  the  number  of  internal  nodes.  The  average  number  of  bit  inspections 
made  in  a random  unsuccessful  search  is  ]T)i  ZZi/2-1  = |6'(|)  = a(i),  since  we 
end  up  at  a given  external  node  on  level  l with  probability  2~*.  The  number  of 
comparisons  is  the  same  as  the  number  of  bit  inspections,  plus  one  in  a successful 
search.  For  example,  in  Fig.  35,  a successful  search  will  take  2~  bit  inspections 
and  3jg  comparisons,  on  the  average;  an  unsuccessful  search  will  take  3|  of  each. 

Now  let  gtj{z)  be  the  “average”  a(z)  for  trees  with  N nodes;  in  other  words, 
gN{z)  is  the  sum  Pto-t(z)  over  all  binary  digital  search  trees  T with  N internal 
nodes,  where  ar(z)  is  the  generating  function  for  the  internal  nodes  of  T and 
Pt  is  the  probability  that  T occurs  when  N random  numbers  are  inserted  using 
Algorithm  D.  Then  the  average  number  of  bit  inspections  will  be  g'N(l)/N  in  a 
successful  search,  <7^(5)  in  an  unsuccessful  search. 

We  can  compute  3^(2)  by  mimicking  the  tree  construction  process,  as 
follows.  If  a(z)  is  the  generating  function  for  a tree  of  N nodes,  we  can  form 
N + 1 trees  from  it  by  making  the  next  insertion  into  any  one  of  the  external  node 
positions.  The  insertion  goes  into  a given  external  node  on  level  l with  probability 
2-(;  hence  the  sum  of  the  generating  functions  for  the  N+ 1 new  trees,  multiplied 
by  the  probability  of  occurrence,  is  a{z ) + b[^z)  = a(z)  + 1 + (2  - l)a(i2). 
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Averaging  over  all  trees  for  N nodes,  it  follows  that 

9n+i(z)  = gN{z)  + 1 + (z  - l)gN{\z)\  go(z)  = 0.  (8) 

The  corresponding  generating  function  for  external  nodes, 

hN(z)  = 1 + (2 z-  l)gN(z), 

is  somewhat  easier  to  work  with,  because  (8)  is  equivalent  to  the  formula 

hN+i(z)  = hN(z)  + (2z  - l)hN(±z);  h0(z)  = 1.  (g) 

Applying  this  rule  repeatedly,  we  find  that 

hN+i{z)  = hjv-i(z)  + 2(2z  - \)hN-i(\z)  + (2z  - l)(z  - l)/ijv-i(jz) 

= hN-2(z)  + 3(2z  - 1)/ijv_2(|z)  + 3(2z  - l)(z  - 1 )hN_2{\z) 

+ (2z  - 1 )(z  - 1 ){\z  - l)/ijv-2(|z) 
and  so  on,  so  that  eventually  we  have 

k—  1 

11(2^  z-iy,  (10) 

k j= 0 

k 1 

<mz)=e(  ^ (n) 

k> o v + ' j= 0 

For  example,  gA(z)  = 4 + 6(z  - 1)  + 4(z  - \){\z  - l)  + (z-  l)(§z-  1 ){\z  - l). 
These  formulas  make  it  possible  to  express  the  quantities  we  are  looking  for  as 
sums  of  products: 


Cn  — 3at(1)  — E 

fc>  0 

3= 1 

(12) 

3n(|)  = E 

k>0 

N ^ 

{k  + 1)tl &-j-i)  = cN+1-cN. 

3= 1 

(13) 

It  is  not  at  all  obvious  that  this  formula  for  CV  satisfies  (6)! 

Unfortunately,  these  expressions  are  not  suitable  for  calculation  or  for  finding 
an  asymptotic  expansion,  since  2~J  — 1 is  negative;  we  get  large  terms  and  a lot 
of  cancellation.  A more  useful  formula  for  CV  can  be  obtained  by  applying  the 
partition  identities  of  exercise  5.1.1-16.  We  have 

s« = (n<i  - 2V  e (t +2)(->)‘ii(i  - 2-'-*-1)-1 

V>  1 / fc> oVfc  + J/  ;>o 

= ( n*1  - e (fc + 2)(~i)fc  E(2^_i)m  nu  - 2-t1 

W j k> oKk  + 2/  m>0  r=l 
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= ^2m(^(^V-2-m)fe-l  + 2“mAM  JJ(1  - 
m> 0 V k J j> o 


= 2m((l-2-m)Ar-  1 + 2~mN)  ^(-2-m-1) 

m>  0 n>0 


2~  n(n  — 1)/2 

nrn=i(i-2-)' 


(14) 


This  may  not  seem  at  first  glance  to  be  an  improvement  over  Eq.  (12),  but  it 
has  the  great  advantage  that  the  sum  on  m converges  rapidly  for  each  fixed  n. 
A precisely  analogous  situation  occurred  for  the  trie  case  in  Eqs.  5.2.2-(38)  and 
5.2.2— (39);  in  fact,  if  we  consider  only  the  terms  of  (14)  with  n = 0,  we  have 
exactly  N — \ plus  the  number  of  bit  inspections  in  a binary  trie.  We  can  now 
proceed  to  get  the  asymptotic  value  in  essentially  the  same  way  as  before;  see 
exercise  27.  [The  derivation  above  is  largely  based  on  an  approach  suggested  by 
A.  J.  Konheim  and  D.  J.  Newman,  Discrete  Mathematics  4 (1973),  57-63.] 


Finally  let  us  take  a mathematical  look  at  Patricia.  In  her  case  the  binary 
tree  is  like  the  corresponding  binary  trie  on  the  same  keys,  but  squashed  together 
(because  the  SKIP  fields  eliminate  1-way  branching),  so  that  there  are  always 
exactly  IV  — 1 internal  nodes  and  N external  nodes.  Figure  36  shows  the  Patrician 
tree  corresponding  to  the  sixteen  keys  in  the  trie  of  Fig.  34.  The  number  shown 
in  each  branch  node  is  the  amount  of  SKIP;  the  keys  are  indicated  with  the 
external  nodes,  although  the  external  node  is  not  explicitly  present  (there  is 
actually  a tagged  link  to  an  internal  node  that  references  the  TEXT,  in  place  of 
each  external  node).  For  the  purposes  of  analysis,  we  may  assume  that  external 
nodes  exist  as  shown. 

Since  successful  searches  with  Patricia  end  at  external  nodes,  the  average 
number  of  bit  inspections  made  in  a random  successful  search  will  be  the  external 
path  length,  divided  by  N.  If  we  form  the  generating  function  b(z)  for  external 
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nodes  as  above,  this  will  be  b'(l)/b{  1).  An  unsuccessful  search  with  Patricia  also 
ends  at  an  external  node,  but  weighted  with  probability  2~l  for  external  nodes 
on  level  /,  so  the  average  number  of  bit  inspections  is  |b'( |).  For  example,  in 
Fig.  36  we  have  b(z)  = 3z3+8z4+3z5  + 2z6;  therefore  there  are  4|  bit  inspections 
per  successful  search  and  3||  per  unsuccessful  search,  on  the  average. 

Let  hn(z)  be  the  “average”  b(z)  for  a Patrician  tree  constructed  with  n 
external  nodes,  using  uniformly  distributed  keys.  The  recurrence  relation 

hn(z)  — 2 ^ 1 ( ^ 1 hk(z)  (z  + ~ z)) , ho(z)  = 0,  h\(z)  — 1 (15) 

k 

appears  to  have  no  simple  solution.  But  fortunately,  there  is  a simple  recurrence 
for  the  average  external  path  length  h'n(  1),  since 

h'n(  l)  = 21-"^(^)/l'fe(l)  + 21-^(”)fc(l-4n) 

k k 

= n-21-"n  + 21-^(")h'fe(l).  (16) 

k 

Since  this  has  the  form  of  (6),  we  can  use  the  methods  already  developed  to  solve 
f°r  h\ j(l),  which  turns  out  to  be  exactly  n less  than  the  corresponding  number 
of  bit  inspections  in  a random  binary  trie.  Thus,  the  SKIP  fields  save  us  about 
one  bit  inspection  per  successful  search,  on  random  data.  (See  exercise  31.)  The 
redundancy  of  typical  real  data  will  lead  to  greater  savings. 

When  we  try  to  find  the  average  number  of  bit  inspections  for  a random 
unsuccessful  search  by  Patricia,  we  obtain  the  recurrence 

an  = 1 + 2 X^(fc)afc’  for  n > 2;  a0  = ax  = 0.  (17) 

k<n 

Here  a„  = \h'n(^).  This  does  not  have  the  form  of  any  recurrence  we  have 
studied,  nor  is  it  easily  transformed  into  such  a recurrence.  The  theory  of  Mellin 
transforms,  introduced  in  Section  5.2.2  and  the  references  cited  there,  provides 
a high-level  way  to  deal  with  recurrences  that  have  a digital  character.  It  turns 
out  that  the  solution  to  (17)  involves  the  Bernoulli  numbers: 


nCLn—X 


2 


71  + 2 


for  n > 2. 


(18) 


This  formula  is  probably  the  hardest  asymptotic  nut  we  have  yet  had  to  crack; 
the  solution  in  exercise  34  is  an  instructive  review  of  many  things  we  have  done 
before,  with  some  slightly  different  twists. 


Summary  of  the  analyses.  As  a result  of  all  the  complicated  mathematics  in 
this  section,  the  following  facts  are  perhaps  the  most  noteworthy: 

a)  The  number  of  nodes  needed  to  store  N random  keys  in  an  M- ary  trie, 
with  the  trie  branching  terminated  for  subfiles  of  < s keys,  is  approximately 
N/(slnM).  This  approximation  is  valid  for  large  N,  small  s,  and  small  M. 
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Since  a trie  node  involves  M link  fields,  we  will  need  only  about  A/ In  M link 
fields  if  we  choose  s = M. 


b)  The  number  of  digits  or  characters  examined  during  a random  search  is 
approximately  logM  N for  all  methods  considered.  When  M = 2,  the  various 
analyses  give  us  the  following  more  accurate  approximations  to  the  number  of 
bit  inspections: 

Successful  Unsuccessful 


Trie  search  lgJV  + 1.33275  IgA  — 0.10995 

Digital  tree  search  IgA  — 1.71665  lg  A — 0.27395 

Patricia  lg  A + 0.33275  lg  IV  - 0.31875 

(These  approximations  can  all  be  expressed  in  terms  of  fundamental  mathemat- 
ical constants;  for  example,  0.31875  stands  for  (In  n — 7)/ In  2 — 1/2.) 

c)  “Random”  data  here  means  that  the  M- ary  digits  are  uniformly  distrib- 
uted, as  if  the  keys  were  real  numbers  between  0 and  1 expressed  in  M- ary 
notation.  Digital  search  methods  are  insensitive  to  the  order  in  which  keys  are 
entered  into  the  file  (except  for  Algorithm  D,  which  is  only  slightly  sensitive  to 
the  order);  but  they  are  very  sensitive  to  the  distribution  of  digits.  For  example, 
if  0-bits  are  much  more  common  than  1-bits,  the  trees  will  become  much  more 
skewed  than  they  would  be  for  random  data  as  considered  in  the  analyses  cited 
above.  Exercise  5.2.2-53  works  out  one  example  of  what  happens  when  the  data 
is  biased  in  this  way. 


EXERCISES 

1.  [00]  If  a tree  has  leaves,  what  does  a trie  have? 

2.  [20]  Design  an  algorithm  for  the  insertion  of  a new  key  into  an  M- ary  trie,  using 
the  conventions  of  Algorithm  T. 

3.  [21]  Design  an  algorithm  for  the  deletion  of  a key  from  an  M- ary  trie,  using  the 
conventions  of  Algorithm  T. 

► 4.  [21]  Most  of  the  360  entries  in  Table  1 are  blank  (null  links).  But  we  can 
compress  the  table  into  only  49  entries,  by  overlapping  nonblank  entries  with  blank 
ones  as  follows: 


Position 


Entry 


04 

CO 

to 

to 

r> 

00 

01 

0 

,_l 

CM 

CO 

TF 

to 

to 

00 

03 

O 

CM 

CO 

rF 

to 

i-H 

rH 

rH 

1-1 

1-1 

1-1 

1-1 

r-H 

rH 

CM 

CM 

CM 

CM 

CM 

CM 

o' 

CO 

< 

H 

X 

r— 1 

b 

w 

g 

CO 
1— 1 

WHICH 

R 

M 

CO 

M 

X 

oT 

55 

HE 

x 

O 

Q 

< 

N-*' 

& : 

H 

0 

m 

x 

H 

O 

M 

«S 

a 

H 

X 

Position 


Entry 


to 

F— 

00 

03 

O 

rH 

CM 

CO 

tF 

to 

to 

u- 

■£1 

03 

0 

rH 

CM 

CO 

tF 

to 

to 

00 

03 

CM 

CM 

CM 

CM 

CO 

CO 

CO 

CO 

CO 

CO 

CO 

CO 

col 

CO 

^F 

TF 

tF 

rF  ' 

^F 

TF 

'tF 

tF 

5 

oc: 

0 

►« 

X 

X 

O 

05 

i 

H 

O 

s 

CO 

H 

CO 

H 

w 

> 

< 

X 

0 

ou 

u- 

PQ 

M 

U- 

< 

2 

w 

<«: 

w 

M 

c 

< 

X 

>« 

(Nodes  (1),  (2),  . . . , (12)  of  Table  1 begin,  respectively,  at  positions  20,  19,  3,  14,  1,  17, 
1,  7,  3,  20,  18,  4 within  this  compressed  table.) 

Show  that  if  the  compressed  table  is  substituted  for  Table  1,  Program  T will  still 
work,  but  not  quite  as  fast. 
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► 5.  [ M26 ] (Y.  N.  Patt.)  The  trees  of  Fig.  31  have  their  letters  arranged  in  alphabetic 
order  within  each  family.  This  order  is  not  necessary,  and  if  we  rearrange  the  order 
of  nodes  within  the  families  before  constructing  binary  tree  representations  such  as 
(2)  we  may  get  a faster  search.  What  rearrangement  of  Fig.  31  is  optimum  from 
this  standpoint?  (Use  the  frequency  assumptions  of  Fig.  32,  and  find  the  forest  that 
minimizes  the  successful  search  time  when  it  has  been  represented  as  a binary  tree.) 

6.  [ 15]  What  digital  search  tree  is  obtained  if  the  fifteen  4-bit  binary  keys  0001, 
0010,  0011,  . . . , 1111  are  inserted  in  increasing  order  by  Algorithm  D?  (Start  with 
0001  at  the  root  and  then  do  fourteen  insertions.) 

► 7.  [M26]  If  the  fifteen  keys  of  exercise  6 are  inserted  in  a different  order,  we  might 
get  a different  tree.  Of  all  the  15!  possible  permutations  of  these  keys,  which  is  the 
worst,  in  the  sense  that  it  produces  a tree  with  the  greatest  internal  path  length? 

8.  [20]  Consider  the  following  changes  to  Algorithm  D,  which  have  the  effect  of 
eliminating  variable  K':  Change  “K'”  to  UK”  in  both  places  in  step  D2,  and  delete 
the  operation  UK'  <—  K”  from  step  Dl.  Will  the  resulting  algorithm  still  be  valid  for 
searching  and  insertion? 

9.  [21]  Write  a MIX  program  for  Algorithm  D,  and  compare  it  to  Program  6.2.2T. 
You  may  use  binary  operations  such  as  SLB  (shift  left  AX  binary),  JAE  (jump  if  A even), 
etc.;  and  you  may  also  use  the  idea  of  exercise  8 if  it  helps. 

10.  [23]  Given  a file  in  which  all  the  keys  are  n-bit  binary  numbers,  and  given  a search 
argument  K = b\  62  • ■ ■ b„,  suppose  we  want  to  find  the  maximum  value  of  k such  that 
there  is  a key  in  the  file  beginning  with  the  bit  pattern  61  62  ■ ■ ■ 6*  . How  can  we  do  this 
efficiently  if  the  file  is  represented  as 

a)  a binary  search  tree  (Algorithm  6.2.2T)? 

b)  a binary  trie  (Algorithm  T)? 

c)  a binary  digital  search  tree  (Algorithm  D)? 

11.  [21]  Can  Algorithm  6. 2. 2D  be  used  without  change  to  delete  a node  from  a digital 
search  tree? 

12.  [25]  After  a random  element  is  deleted  from  a random  digital  search  tree  con- 
structed by  Algorithm  D,  is  the  resulting  tree  still  random?  (See  exercise  11  and 
Theorem  6.2.2H.) 

13.  [20]  ( M-ary  digital  searching.)  Explain  how  Algorithms  T and  D can  be  combined 
into  a generalized  algorithm  that  is  essentially  the  same  as  Algorithm  D when  M = 2. 
What  changes  would  be  made  to  Table  1,  if  your  algorithm  is  used  for  M = 30? 

► 14.  [25]  Design  an  efficient  algorithm  that  can  be  performed  just  after  Algorithm  P 
has  terminated  successfully,  to  locate  all  places  where  K appears  in  the  TEXT. 

15.  [28]  Design  an  efficient  algorithm  that  can  be  used  to  construct  the  tree  used  by 
Patricia,  or  to  insert  new  TEXT  references  into  an  existing  tree.  Your  insertion  algorithm 
should  refer  to  the  TEXT  array  at  most  twice. 

16.  [22]  Why  is  it  desirable  for  Patricia  to  make  the  restriction  that  no  key  is  a prefix 
of  another? 

17.  [M25]  Find  a way  to  express  the  solution  of  the  recurrence 

xq  = x\  = 0,  x„  = a„+m1'”^^™j(m  - l)n~kXk,  n > 2, 

k 

in  terms  of  binomial  transforms,  by  generalizing  the  technique  of  exercise  5.2.2-36. 
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18.  [M21  ] Use  the  result  of  exercise  17  to  express  the  solutions  to  (4)  and  (5)  in  terms 
of  functions  Un  and  Vn  analogous  to  those  defined  in  exercise  5.2.2-38. 

19.  [HM23]  Find  the  asymptotic  value  of  the  function 


K(n,s,m) 


-EC 

k>2 


)0 


mfc_1  — 1 


to  0(1)  as  n -A  00,  for  fixed  s > 0 and  m > 1.  [The  case  s = 0 has  already  been  solved 
in  exercise  5.2.2-50,  and  the  case  s = 1,  m = 2 has  been  solved  in  exercise  5.2.2-48.] 

► 20.  [M30]  Consider  M- ary  trie  memory  in  which  we  use  a sequential  search  whenever 
reaching  a subfile  of  s or  fewer  keys.  (Algorithm  T is  the  special  case  s = 1.)  Apply 
the  results  of  the  preceding  exercises  to  analyze 

a)  the  average  number  of  trie  nodes; 

b)  the  average  number  of  digit  or  character  inspections  in  a successful  search;  and 

c)  the  average  number  of  comparisons  made  in  a successful  search. 

State  your  answers  as  asymptotic  formulas  as  N -A  00,  for  fixed  M and  s;  the  answer 
for  (a)  should  be  correct  to  within  0(1),  and  the  answers  for  (b)  and  (c)  should  be 
correct  to  within  0(N~1).  [When  M = 2,  this  analysis  applies  also  to  the  modified 
radix  exchange  sort,  in  which  subfiles  of  size  < s are  sorted  by  insertion.] 

21.  [ M25 ] How  many  of  the  nodes,  in  a random  M- ary  trie  containing  N keys,  have 
a null  pointer  in  table  entry  0?  (For  example,  9 of  the  12  nodes  in  Table  1 have  a null 
pointer  in  the  “u”  position.  “Random”  in  this  exercise  means  as  usual  that  the  digits 
of  the  keys  are  uniformly  distributed  between  0 and  M — 1.) 

22.  [M25]  How  many  trie  nodes  are  on  level  l of  a random  M- ary  trie  containing 
N keys,  for  l = 0,  1,  2,  ...  ? 

23.  [M26]  How  many  digit  inspections  are  made  on  the  average  during  an  unsuccessful 
search  in  an  M- ary  trie  containing  N random  keys? 

24.  [ M30 ] Consider  an  M-ary  trie  that  has  been  represented  as  a forest  (see  Fig.  31). 
Find  exact  and  asymptotic  expressions  for 

a)  the  average  number  of  nodes  in  the  forest; 

b)  the  average  number  of  times  “P  <—  RLINK(P)”  is  performed  during  a random 
successful  search. 

► 25.  [ M24 ] The  mathematical  derivations  of  asymptotic  values  in  this  section  have 
been  quite  difficult,  involving  complex  variable  theory,  because  it  is  desirable  to  get 
more  than  just  the  leading  term  of  the  asymptotic  behavior  (and  the  second  term  is 
intrinsically  complicated).  The  purpose  of  this  exercise  is  to  show  that  elementary 
methods  are  good  enough  to  deduce  some  of  the  results  in  weaker  form. 

a)  Prove  by  induction  that  the  solution  to  (4)  satisfies  An  < M (N  — 1 )/(M  — 1). 

b)  Let  Dn  = Cn  — N Hn-i/Ih  M,  where  Cn  is  defined  by  (5).  Prove  that  Dn  = 
O(N);  hence  Cn  = lVlogM  N + O(N).  [Hint:  Use  (a)  and  Theorem  1.2.7A.] 

26.  [23]  Determine  the  value  of  the  infinite  product 

(i-*)(i-$)(i-£)(i-&)- 

correct  to  five  decimal  places,  by  hand  calculation.  [Hint:  See  exercise  5.1.1-16.] 

27.  [HM31  ] What  is  the  asymptotic  value  of  Cn,  as  given  by  (14),  to  within  0(1)? 
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28.  [HM26]  Find  the  asymptotic  average  number  of  digit  inspections  when  searching 
in  a random  M- ary  digital  search  tree,  for  general  M > 2.  Consider  both  successful 
and  unsuccessful  search,  and  give  your  answer  to  within  0(1V_1). 

29.  [HM40]  What  is  the  asymptotic  average  number  of  nodes,  in  an  M- ary  digital 
search  tree,  for  which  all  M links  are  null?  (We  might  save  memory  space  by  eliminating 
such  nodes;  see  exercise  13.) 

30.  [M24]  Show  that  the  Patrician  generating  function  hn(z)  defined  in  (15)  can  be 
expressed  in  the  rather  horrible  form 

( "-1  ) I ^ 

(2°i  - l)(2ai+°2  - 1) . . . (2ai+-"+“™  - 1 ) J 

ai  1 


»£*"(  £ 

ra>  1 ^ ai H |-am=n  — 1 


[Thus,  if  there  is  a simple  formula  for  hn(z),  we  will  be  able  to  simplify  this  rather 
ungainly  expression.] 

31.  [ M21 ] Solve  the  recurrence  (16). 

32.  [M21  ] What  is  the  average  value  of  the  sum  of  all  SKIP  fields  in  a random  Patrician 
tree  with  N — 1 internal  nodes? 

33.  [M30]  Prove  that  (18)  is  a solution  to  the  recurrence  (17).  [Hint:  Consider  the 
generating  function  A(z)  = ]Cn>0  o,nZn/n\.] 

34.  [HM40]  The  purpose  of  this  exercise  is  to  find  the  asymptotic  behavior  of  (18). 

a)  Prove  that,  if  n > 2, 

1 y-  /w\  Bk  /i”-i+2n-1  + ---  + (2j-l)n-1  2J  1\ 

« “ \ fc  / 2fc_1  — 1 2J(’>-1)  n+2/' 

2 <k<n  j>l  v / 

b)  Show  that  the  summand  in  (a)  is  approximately  l/(ex  - 1)  — 1/x  + 1/2,  where 
x = n/23 ; the  resulting  sum  equals  the  original  sum  plus  0(n_1). 

c)  Show  that 


1 111  5+*°° 

— 7 f 7,  = -■  / C(z)r(z):r  Zdz,  for  real  x > 0. 

ex  -1  x 2 2tti  J_i_ioo 

d)  Therefore  the  sum  equals 


1 

27 xi 


gz)r(z)n-> 

2~z  - 1 


dz  + 0(n  x); 


evaluate  this  integral. 

► 35.  [ M20 ] What  is  the  probability  that  Patricia’s  tree  on  five  keys  will  be 


with  the  SKIP  fields  a,  b,  c,  d as  shown?  (Assume  that  the  keys  have  independent 
random  bits,  and  give  your  answer  as  a function  of  a,  6,  c,  and  d.) 
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36.  [M25]  There  are  five  binary  trees  with  three  internal  nodes.  If  we  consider  how 
frequently  each  particular  one  of  these  occurs  as  the  search  tree  in  various  algorithms, 
for  random  data,  we  find  the  following  different  probabilities: 


Tree  search  1 1 

(Algorithm  6.2.2T)  6 6 

Digital  tree  search  1 1 

(Algorithm  D)  8 8 

Patricia  1 1 

(Algorithm  P)  7 7 


1 11 

3 6 6 

1 11 

2 8 8 

3 11 

7 7 7 


(Notice  that  the  digital  search  tree  tends  to  be  balanced  more  often  than  the  others.) 
In  exercise  6. 2. 2-5  we  found  that  the  probability  of  a tree  in  the  tree  search  algorithm 
was  n(l/s(x)),  where  the  product  is  over  all  internal  nodes  x,  and  s(x)  is  the  number 
of  internal  nodes  in  the  subtree  rooted  at  x.  Find  similar  formulas  for  the  probability 
of  a tree  in  the  case  of  (a)  Algorithm  D;  (b)  Algorithm  P. 

► 37.  [M22]  Consider  a binary  tree  with  6;  external  nodes  on  level  l.  The  text  observes 
that  the  running  time  for  unsuccessful  searching  in  digital  search  trees  is  not  directly 
related  to  the  external  path  length  £ ]lbi , but  instead  it  is  essentially  proportional  to 
the  modified  external  path  length  ]CZf>i2_i.  Prove  or  disprove:  The  smallest  modified 
external  path  length,  over  all  trees  with  N external  nodes,  occurs  when  all  of  the 
external  nodes  appear  on  at  most  two  adjacent  levels.  (See  exercise  5.3.1-20.) 

38.  [M40]  Develop  an  algorithm  to  find  the  n-node  tree  having  the  minimum  value 
of  a ■ (internal  path  length)  + /3  • (modified  external  path  length),  given  a and  /3,  in  the 
sense  of  exercise  37. 


39.  [M43]  Develop  an  algorithm  to  find  optimum  digital  search  trees,  analogous  to 
the  optimum  binary  search  trees  considered  in  Section  6.2.2. 

► 40.  [ 25 ] Let  do  ai  02  ■ ■ ■ be  a periodic  binary  sequence  with  ajv+k  = a&  for  all  k > 0. 
Show  that  there  is  a way  to  represent  any  fixed  sequence  of  this  type  in  O(N)  memory 
locations,  so  that  the  following  operation  can  be  done  in  only  O(N)  steps:  Given  any 
binary  pattern  bo  b\ . . . f>n-i,  determine  how  often  the  pattern  occurs  in  the  period 
(thus,  find  how  many  values  of  p exist  with  0 < p < N and  bk  = aP+k  for  0 < k < n). 
The  length  n of  the  pattern  is  variable  as  well  as  the  pattern  itself.  Assume  that  each 
memory  location  can  hold  arbitrary  integers  between  0 and  N.  [Hint:  See  exercise  14.] 
41.  [HM28]  This  is  an  application  to  group  theory.  Let  G be  the  free  group  on  the 
letters  {ai, . . . , a„},  namely  the  set  of  all  strings  a = bi ...  br,  where  each  bi  is  one  of  the 
a3  or  a~  and  no  adjacent  pair  a3aj  or  aj  a}  occurs.  The  inverse  of  a is  b~ . . . and  we 
multiply  two  such  strings  by  concatenating  them  and  canceling  adjacent  inverse  pairs. 
Let  H be  the  subgroup  of  G generated  by  the  strings  {/3i,  ■ . . , /3P},  namely  the  set  of  all 
elements  of  G that  can  be  written  as  products  of  the  /3’s  and  their  inverses.  According 
to  a well-known  theorem  of  Jakob  Nielsen  (see  Marshall  Hall,  The  Theory  of  Groups 
(New  York:  Macmillan,  1959),  Chapter  7),  we  can  always  find  generators  0\  , . . . , Orn 
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of  H,  with  m <p,  having  the  property  that  the  middle  character  of  9,  (or  at  least  one  of 
the  two  central  characters  of  9i  if  it  has  even  length)  is  never  canceled  in  the  expressions 
OiOj  or  9j0i,  e = ±1,  unless  j = i and  e = — 1.  This  property  implies  that  there  is 
a simple  algorithm  for  testing  whether  an  arbitrary  element  of  G is  in  H:  Record  the 
2 m keys  91,...,0m,  8i  , ...  ,9m  in  a character-oriented  search  tree,  using  the  2 n letters 
ai,  ■ ■ ■ ,a„,  ai  , . . . ,a„.  Let  a = bi. . . br  be  a given  element  of  G;  if  r = 0,  a is  obviously 
in  H.  Otherwise  look  up  a,  finding  the  longest  prefix  bi...bk  that  matches  a key.  If 
there  is  more  than  one  key  beginning  with  h . . . bk,  a is  not  in  H\  otherwise  let  the 
unique  such  key  be  6i . . . bkc\ . . . ci  =91,  and  replace  a by  9~ea  = cf  . . . cf  bk+1  . . . br. 
If  this  new  value  of  a is  longer  than  the  old  (that  is,  if  l > k),  a is  not  in  H;  otherwise 
repeat  the  process  on  the  new  value  of  a.  The  Nielsen  property  implies  that  this 
algorithm  will  always  terminate.  If  a is  eventually  reduced  to  the  null  string,  we  can 
reconstruct  the  representation  of  the  original  a as  a product  of  0’s. 

For  example,  let  {01,02,03}  = {bbb,  b~a~b~,  ba~b } and  a = bbabaab.  The  forest 


can  be  used  with  the  algorithm  above  to  deduce  that  a = 9 183  8183  9 2 ■ Implement 
this  algorithm,  given  the  8’ s as  input  to  your  program. 

42.  [23]  ( Front  and.  rear  compression.)  When  a set  of  binary  keys  is  being  used  as  an 
index,  to  partition  a larger  file,  we  need  not  store  the  full  keys.  For  example,  if  the 
sixteen  keys  of  Fig.  34  are  used,  they  can  be  truncated  at  the  right,  as  soon  as  enough 
digits  have  been  given  to  identify  them  uniquely:  0000,  0001,  00100,  00101,  010,  . . . , 
1110001.  These  truncated  keys  can  be  used  to  partition  a file  into  seventeen  parts, 
where  for  example  the  fifth  part  consists  of  all  keys  beginning  with  0011  or  010,  and 
the  last  part  contains  all  keys  beginning  with  111001,  11101,  or  1111.  The  truncated 
keys  can  be  represented  more  compactly  if  we  suppress  all  leading  digits  common  to 
the  previous  key:  0000,  oool,  00IOO,  0000 1,  olO,  . . . , ooooool.  The  bit  following  a o is 
always  1,  so  it  may  be  suppressed.  A large  file  will  have  many  o’s,  and  we  need  store 
only  the  number  of  o’s  and  the  values  of  the  following  bits. 

Show  that  the  total  number  of  bits  in  the  compressed  file,  excluding  o’s  and  the 
following  1-bits,  is  always  equal  to  the  number  of  nodes  in  the  binary  trie  for  the  keys. 

(Consequently  the  average  total  number  of  such  bits  in  the  entire  index  is  about 
N/ln  2,  only  1.44  bits  per  key.  This  compression  technique  was  shown  to  the  author  by 
A.  Heller  and  R.  L.  Johnsen.  Still  further  compression  is  possible,  since  we  need  only 
represent  the  trie  structure;  see  Theorem  2.3. 1A.) 

43.  [HM42]  Analyze  the  height  of  a random  M-ary  trie  that  has  N keys  and  cutoff 
parameter  s as  in  exercise  20.  (When  s = 1,  this  is  the  length  of  the  longest  common 
prefix  of  N long  random  words  in  an  M- ary  alphabet.) 

► 44.  [30]  (J.  L.  Bentley  and  R.  Sedgewick.)  Explore  a ternary  representation  of  tries, 
in  which  left  and  right  links  correspond  to  the  horizontal  branches  of  (2)  while  middle 
links  correspond  to  the  downward  branches. 

► 45.  [M25]  If  the  seven  keys  of  Fig.  33  are  inserted  in  random  order  by  the  algorithm 
of  exercise  15,  what  is  the  probability  of  obtaining  the  tree  shown? 
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So  FAR  WE  HAVE  CONSIDERED  search  methods  based  on  comparing  the  given 
argument  K to  the  keys  in  the  table,  or  using  its  digits  to  govern  a branching 
process.  A third  possibility  is  to  avoid  all  this  rummaging  around  by  doing  some 
arithmetical  calculation  on  K , computing  a function  f(K)  that  is  the  location 
of  K and  the  associated  data  in  the  table. 

For  example,  let’s  consider  again  the  set  of  31  English  words  that  we  have 
subjected  to  various  search  strategies  in  Sections  6.2.2  and  6.3.  Table  1 shows 
a short  MIX  program  that  transforms  each  of  the  31  keys  into  a unique  number 
f(K ) between  —10  and  30.  If  we  compare  this  method  to  the  MIX  programs 
for  the  other  methods  we  have  considered  (for  example,  binary  search,  optimal 
tree  search,  trie  memory,  digital  tree  search),  we  find  that  it  is  superior  from 
the  standpoint  of  both  space  and  speed,  except  that  binary  search  uses  slightly 
less  space.  In  fact,  the  average  time  for  a successful  search,  using  the  program 
of  Table  1 with  the  frequency  data  of  Fig.  12,  is  only  about  17. 8u,  and  only  41 
table  locations  are  needed  to  store  the  31  keys. 

Unfortunately,  such  functions  f(K)  aren’t  very  easy  to  discover.  There  are 
4131  « 1050  possible  functions  from  a 31-element  set  into  a 41-element  set,  and 
only  41  ■ 40  • ...  • 11  = 411/10!  « 1043  of  them  will  give  distinct  values  for  each 
argument;  thus  only  about  one  of  every  10  million  functions  will  be  suitable. 

Functions  that  avoid  duplicate  values  are  surprisingly  rare,  even  with  a fairly 
large  table.  For  example,  the  famous  “birthday  paradox”  asserts  that  if  23  or 
more  people  are  present  in  a room,  chances  are  good  that  two  of  them  will  have 
the  same  month  and  day  of  birth!  In  other  words,  if  we  select  a random  function 
that  maps  23  keys  into  a table  of  size  365,  the  probability  that  no  two  keys  map 
into  the  same  location  is  only  0.4927  (less  than  one-half).  Skeptics  who  doubt 
this  result  should  try  to  find  the  birthday  mates  at  the  next  large  parties  they 
attend.  [The  birthday  paradox  was  discussed  informally  by  mathematicians  in 
the  1930s,  but  its  origin  is  obscure;  see  I.  J.  Good,  Probability  and  the  Weighing 
of  Evidence  (Griffin,  1950),  38.  See  also  R.  von  Mises,  istanbul  Universitesi 
Fen  Fakiiltesi  Mecmuasi  4 (1939),  145-163,  and  W.  Feller,  An  Introduction  to 
Probability  Theory  (New  York:  Wiley,  1950),  Section  2.3.] 

On  the  other  hand,  the  approach  used  in  Table  1 is  fairly  flexible  [see 
M.  Greniewski  and  W.  Turski,  CACM  6 (1963),  322-323],  and  for  a medium- 
sized table  a suitable  function  can  be  found  after  about  a day’s  work.  In 
fact  it  is  rather  amusing  to  solve  a puzzle  like  this.  Suitable  techniques  have 
been  discussed  by  many  people,  including  for  example  R.  Sprugnoli,  CACM  20 
(1977),  841-850,  22  (1979),  104,  553;  R.  J.  Cichelli,  CACM  23  (1980),  17-19; 
T.  J.  Sager,  CACM  28  (1985),  523-532,  29  (1986),  557;  B.  S.  Majewski,  N.  C. 
Wormald,  G.  Havas,  and  Z.  J.  Czech,  Comp.  J.  39  (1996),  547-554;  Czech, 
Havas,  and  Majewski,  Theoretical  Comp.  Sci.  182  (1997),  1-143.  See  also  the 
article  by  J.  Korner  and  K.  Marton,  Europ.  J.  Combinatorics  9 (1988),  523-530, 
for  theoretical  limitations  on  perfect  hash  functions. 

Of  course  this  method  has  a serious  flaw,  since  the  contents  of  the  table 
must  be  known  in  advance;  adding  one  more  key  will  probably  ruin  everything, 
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TRANSFORMING  A SET  OF  KEYS  INTO  UNIQUE  ADDRESSES 
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making  it  necessary  to  start 

over  almost  from  scratch. 

We 

can 

obtain 

a much 

more  versatile  method  if  we  give  up  the  idea  of  uniqueness,  permitting  different 
keys  to  yield  the  same  value  f(K),  and  using  a special  method  to  resolve  any 
ambiguity  after  f{K)  has  been  computed. 

These  considerations  lead  to  a popular  class  of  search  methods  commonly 
known  as  hashing  or  scatter  storage  techniques.  The  verb  “to  hash”  means 
to  chop  something  up  or  to  make  a mess  out  of  it;  the  idea  in  hashing  is  to 
scramble  some  aspects  of  the  key  and  to  use  this  partial  information  as  the  basis 
for  searching.  We  compute  a hash  address  h(K)  and  begin  searching  there. 

The  birthday  paradox  tells  us  that  there  will  probably  be  distinct  keys 
Ki  7^  Kj  that  hash  to  the  same  value  h(Kt)  — h ( Kj ) . Such  an  occurrence  is 
called  a collision,  and  several  interesting  approaches  have  been  devised  to  handle 
the  collision  problem.  In  order  to  use  a hash  table,  programmers  must  make  two 
almost  independent  decisions:  They  must  choose  a hash  function  h(K),  and  they 
must  select  a method  for  collision  resolution.  We  shall  now  consider  these  two 
aspects  of  the  problem  in  turn. 


Hash  functions.  To  make  things  more  explicit,  let  us  assume  throughout  this 
section  that  our  hash  function  h takes  on  at  most  M different  values,  with 

0 < h(K)  < M,  (i) 

for  all  keys  K.  The  keys  in  actual  files  that  arise  in  practice  usually  have  a great 
deal  of  redundancy;  we  must  be  careful  to  find  a hash  function  that  breaks  up 
clusters  of  almost  identical  keys,  in  order  to  reduce  the  number  of  collisions. 
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It  is  theoretically  impossible  to  define  a hash  function  that  creates  truly 
random  data  from  the  nonrandom  data  in  actual  files.  But  in  practice  it  is  not 
difficult  to  produce  a pretty  good  imitation  of  random  data,  by  using  simple 
arithmetic  as  we  have  discussed  in  Chapter  3.  And  in  fact  we  can  often  do  even 
better,  by  exploiting  the  nonrandom  properties  of  actual  data  to  construct  a hash 
function  that  leads  to  fewer  collisions  than  truly  random  keys  would  produce. 

Consider,  for  example,  the  case  of  10-digit  keys  on  a decimal  computer. 
One  hash  function  that  suggests  itself  is  to  let  M — 1000,  say,  and  to  let  h(K) 
be  three  digits  chosen  from  somewhere  near  the  middle  of  the  20-digit  product 
K x K.  This  would  seem  to  yield  a fairly  good  spread  of  values  between  000 
and  999,  with  low  probability  of  collisions.  Experiments  with  actual  data  show, 
in  fact,  that  this  “middle  square”  method  isn’t  bad,  provided  that  the  keys  do 
not  have  a lot  of  leading  or  trailing  zeros;  but  it  turns  out  that  there  are  safer 
and  saner  ways  to  proceed,  just  as  we  found  in  Chapter  3 that  the  middle  square 
method  is  not  an  especially  good  random  number  generator. 

Extensive  tests  on  typical  files  have  shown  that  two  major  types  of  hash 
functions  work  quite  well.  One  is  based  on  division,  and  the  other  is  based  on 
multiplication. 

The  division  method  is  particularly  easy;  we  simply  use  the  remainder 
modulo  M: 

h(K)  = K mod  M.  (2) 

In  this  case,  some  values  of  M are  obviously  much  better  than  others.  For 
example,  if  M is  an  even  number,  h(K)  will  be  even  when  K is  even  and  odd 
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when  K is  odd,  and  this  will  lead  to  a substantial  bias  in  many  files.  It  would 
be  even  worse  to  let  M be  a power  of  the  radix  of  the  computer,  since  K mod  M 
would  then  be  simply  the  least  significant  digits  of  K (independent  of  the  other 
digits).  Similarly  we  can  argue  that  M probably  shouldn’t  be  a multiple  of  3; 
for  if  the  keys  are  alphabetic,  two  keys  that  differ  only  by  permutation  of  letters 
would  then  differ  in  numeric  value  by  a multiple  of  3.  (This  occurs  because 
22n  mod  3 = 1 and  10"  mod  3 = 1.)  In  general,  we  want  to  avoid  values  of  M 
that  divide  rk  ± a,  where  k and  a are  small  numbers  and  r is  the  radix  of  the 
alphabetic  character  set  (usually  r = 64,  256,  or  100),  since  a remainder  modulo 
such  a value  of  M tends  to  be  largely  a simple  superposition  of  the  key  digits. 
Such  considerations  suggest  that  we  choose  M to  be  a prime  number  such  that 
rk  ^ ±a  (modulo  M ) for  small  k and  a.  This  choice  has  been  found  to  be  quite 
satisfactory  in  most  cases. 

For  example,  on  the  MIX  computer  we  could  choose  M — 1009,  computing 
h(K ) by  the  sequence 

LDX  K rX  «-  K. 

ENTA  0 rA  <—  0.  (3) 

DIV  =1009=  rXf-k  mod  1009. 


The  multiplicative  hashing  scheme  is  equally  easy  to  do,  but  it  is  slightly 
harder  to  describe  because  we  must  imagine  ourselves  working  with  fractions 
instead  of  with  integers.  Let  w be  the  word  size  of  the  computer,  so  that  w is 
usually  1010  or  230  for  MIX;  we  can  regard  an  integer  A as  the  fraction  A/w  if  we 
imagine  the  radix  point  to  be  at  the  left  of  the  word.  The  method  is  to  choose 
some  integer  constant  A relatively  prime  to  w,  and  to  let 


h(K)  = 


(4) 


In  this  case  we  usually  let  M be  a power  of  2 on  a binary  computer,  so  that 
h(K)  consists  of  the  leading  bits  of  the  least  significant  half  of  the  product  AK. 

In  MIX  code,  if  we  let  M = 2m  and  assume  a binary  radix,  the  multiplicative 
hash  function  is 

LDA  K rA  <-  K. 

MUL  A rAX «—  AK.  , , 

ENTA  0 rAX  «-  AK  mod  w.  ^ 

SLB  m Shift  rAX  m bits  to  the  left. 


Now  h(K)  appears  in  register  A.  Since  MIX  has  rather  slow  multiplication  and 
shift  instructions,  this  sequence  takes  exactly  as  long  to  compute  as  (3);  but  on 
many  machines  multiplication  is  significantly  faster  than  division. 

In  a sense  this  method  can  be  regarded  as  a generalization  of  (3),  since 
we  could  for  example  take  A to  be  an  approximation  to  u>/1009;  multiplying 
by  the  reciprocal  of  a constant  is  often  faster  than  dividing  by  that  constant. 
The  technique  of  (5)  is  almost  a “middle  square”  method,  but  there  is  one 
important  difference:  We  shall  see  that  multiplication  by  a suitable  constant  has 
demonstrably  good  properties. 
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One  of  the  nice  features  of  the  multiplicative  scheme  is  that  no  information 
is  lost  when  we  blank  out  the  A register  in  (5);  we  could  determine  K again, 
given  only  the  contents  of  rAX  after  (5)  has  finished.  The  reason  is  that  A is 
relatively  prime  to  w,  so  Euclid’s  algorithm  can  be  used  to  find  a constant  A' 
with  AA'  mod  w — 1;  this  implies  that  K = ( A'(AK  mod  w))  mod  in.  In  other 
words,  if  f(K)  denotes  the  contents  of  register  X just  before  the  SLB  instruction 
in  (5),  then 

K\  ^ K2  implies  f(Kx)  ± f(K2).  (6) 

Of  course  f(K)  takes  on  values  in  the  range  0 to  w — 1,  so  it  isn’t  any  good  as 
a hash  function,  but  it  can  be  very  useful  as  a scrambling  function , namely  a 
function  satisfying  (6)  that  tends  to  randomize  the  keys.  Such  a function  can  be 
very  useful  in  connection  with  the  tree  search  algorithms  of  Section  6.2.2,  if  the 
order  of  keys  is  unimportant,  since  it  removes  the  danger  of  degeneracy  when 
keys  enter  the  tree  in  increasing  order.  (See  exercise  6.2.2-10.)  A scrambling 
function  is  also  useful  in  connection  with  the  digital  tree  search  algorithm  of 
Section  6.3,  if  the  bits  of  the  actual  keys  are  biased. 

Another  feature  of  the  multiplicative  hash  method  is  that  it  makes  good 
use  of  the  nonrandomness  found  in  many  files.  Actual  sets  of  keys  often  have 
a preponderance  of  arithmetic  progressions,  where  {K,  K+d,  K+2d, . . . , K+td} 
all  appear  in  the  file;  for  example,  consider  alphabetic  names  like  {PARTI,  PART2, 
PART3}  or  {TYPEA,  TYPEB,  TYPEC}.  The  multiplicative  hash  method  converts 
an  arithmetic  progression  into  an  approximate  arithmetic  progression  h(K), 
h(K+d),  h(K  + 2d),  ...  of  distinct  hash  values,  reducing  the  number  of  collisions 
from  what  we  would  expect  in  a random  situation.  The  division  method  has  this 
same  property. 


Fig.  37.  Fibonacci  hashing. 

Figure  37  illustrates  this  aspect  of  multiplicative  hashing  in  a particularly 
interesting  case.  Suppose  that  A/w  is  approximately  the  golden  ratio  <^>-1  = 
(\/5  — 1)/2  « 0.6180339887;  then  the  successive  values  h( K),  h(K+ 1),  h(K  + 2), 
. . . have  essentially  the  same  behavior  as  the  successive  hash  values  h{ 0),  fi(l), 
h( 2),  ...,  so  the  following  experiment  suggests  itself:  Starting  with  the  line 
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segment  [0 . . 1],  we  successively  mark  off  the  points  {<p~1},  {2</>-1},  {3</>-1}, 
where  { x } denotes  the  fractional  part  of  x (namely  x-  \x\ , or  x mod  1).  As  shown 
in  Fig.  37,  these  points  stay  very  well  separated  from  each  other;  in  fact,  each 
newly  added  point  falls  into  one  of  the  largest  remaining  intervals,  and  divides 
it  in  the  golden  ratio!  [This  phenomenon  was  observed  long  ago  by  botanists 
Louis  and  Auguste  Bravais,  Annales  des  Sciences  Naturelles  7 (1837),  42-110, 
who  gave  an  illustration  equivalent  to  Fig.  37  and  related  it  to  the  Fibonacci 
sequence.  See  also  S.  Swierczkowski,  Fundamenta  Math.  46  (1958),  187-189.] 

The  remarkable  scattering  property  of  the  golden  ratio  is  actually  just  a 
special  case  of  a very  general  result,  originally  conjectured  by  Hugo  Steinhaus 
and  first  proved  by  Vera  Turan  Sos  [Acta  Math.  Acad.  Sci.  Hung.  8 (1957), 
461-471;  Ann.  Univ.  Sci.  Budapest.  Eotvos  Sect.  Math.  1 (1958),  127-134]: 

Theorem  S.  Let  9 be  any  irrational  number.  When  the  points  {9},  {29},  ..., 
{n9}  are  placed  in  the  line  segment  [0 . . 1],  the  n + 1 line  segments  formed  have 
at  most  three  different  lengths.  Moreover,  the  next  point  {(n+l)9}  will  fall  in 
one  of  the  largest  existing  segments.  | 

Thus,  the  points  {9},  {29}, . . . , {n9}  are  spread  out  very  evenly  between  0 and  1. 
If  9 is  rational,  the  same  theorem  holds  if  we  give  a suitable  interpretation  to 
the  segments  of  length  0 that  appear  when  n is  greater  than  or  equal  to  the 
denominator  of  9.  A proof  of  Theorem  S,  together  with  a detailed  analysis  of 
the  underlying  structure  of  the  situation,  appears  in  exercise  8;  it  turns  out  that 
the  segments  of  a given  length  are  created  and  destroyed  in  a first-in-first-out 
manner.  Of  course,  some  9’s  are  better  than  others,  since  for  example  a value 
that  is  near  0 or  1 will  start  out  with  many  small  segments  and  one  large  segment. 
Exercise  9 shows  that  the  two  numbers  <p~1  and  <p~2  = 1 - <p~l  lead  to  the  “most 
uniformly  distributed”  sequences,  among  all  numbers  9 between  0 and  1. 

The  theory  above  suggests  Fibonacci  hashing,  where  we  choose  the  constant 
A to  be  the  nearest  integer  to  (p_1w  that  is  relatively  prime  to  w.  For  example 
if  MIX  were  a decimal  computer  we  would  take 


0 

61 

80 

33 

98 

87 

This  multiplier  will  spread  out  alphabetic  keys  like  LIST1,  LIST2,  LIST3  very 
nicely.  But  notice  what  happens  when  we  have  an  arithmetic  series  in  the 
fourth  character  position,  as  in  the  keys  SUM1U,  SUM2U,  SUM3U:  The  effect  is 
as  if  Theorem  S were  being  used  with  9 = {100 A/w}  = .80339887  instead  of 
9 = .6180339887  = A/w.  The  resulting  behavior  is  still  all  right,  in  spite  of  the 
fact  that  this  value  of  9 is  not  quite  as  good  as  <p~l.  On  the  other  hand,  if  the 
progression  occurs  in  the  second  character  position,  as  in  AlUUu,  A2UUu,  A3UUU, 
the  effective  9 is  .9887,  and  this  is  probably  too  close  to  1. 

Therefore  we  might  do  better  with  a multiplier  like 


+ 

61 

61 

61 

61 

61 

in  place  of  (7);  such  a multiplier  will  separate  out  consecutive  sequences  of  keys 
that  differ  in  any  character  position.  Unfortunately  this  choice  suffers  from 
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another  problem  analogous  to  the  difficulty  of  dividing  by  rk  ± 1:  Keys  such 
as  XY  and  YX  will  tend  to  hash  to  the  same  location!  One  way  out  of  this 
difficulty  is  to  look  more  closely  at  the  structure  underlying  Theorem  S.  For 
short  progressions  of  keys,  only  the  first  few  partial  quotients  of  the  continued 
fraction  representation  of  6 are  relevant,  and  small  partial  quotients  correspond 
to  good  distribution  properties.  Therefore  we  find  that  the  best  values  of  6 lie 
in  the  ranges 


f <*<§, 


To  <e< 


3 

4' 


A value  of  A can  be  found  so  that  each  of  its  bytes  lies  in  a good  range  and  is 
not  too  close  to  the  values  of  the  other  bytes  or  their  complements,  for  example 


+ 

61 

25 

42 

33 

71 

(8) 


Such  a multiplier  can  be  recommended.  (These  ideas  about  multiplicative  hash- 
ing are  due  largely  to  R.  W.  Floyd.) 

A good  hash  function  should  satisfy  two  requirements: 

a)  Its  computation  should  be  very  fast. 

b)  It  should  minimize  collisions. 

Property  (a)  is  machine-dependent,  and  property  (b)  is  data-dependent.  If  the 
keys  were  truly  random,  we  could  simply  extract  a few  bits  from  them  and  use 
those  bits  for  the  hash  function;  but  in  practice  we  nearly  always  need  to  have  a 
hash  function  that  depends  on  all  bits  of  the  key  in  order  to  satisfy  (b). 

So  far  we  have  considered  how  to  hash  one-word  keys.  Multiword  or  vari- 
able-length keys  can  be  handled  by  multiple-precision  extensions  of  the  methods 
above,  but  it  is  generally  adequate  to  speed  things  up  by  combining  the  individual 
words  together  into  a single  word,  then  doing  a single  multiplication  or  division 
as  above.  The  combination  can  be  done  by  addition  mod  w,  or  by  exclusive-or 
on  a binary  computer;  both  of  these  operations  have  the  advantage  that  they  are 
invertible,  namely  that  they  depend  on  all  bits  of  both  arguments,  and  exclusive- 
or  is  sometimes  preferable  because  it  avoids  arithmetic  overflow.  However,  both 
of  these  operations  are  commutative,  hence  (X,Y)  and  (Y,X)  will  hash  to  the 
same  address;  G.  D.  Knott  has  suggested  avoiding  this  problem  by  doing  a cyclic 
shift  just  before  adding  or  exclusive-oring. 

An  even  better  way  to  hash  /-character  or  /-word  keys  K = x\x2  ■ ■ ■ Xi  is  to 
compute 

h(K)  = (/ii(xi)  + h2(x2)  H h hi(xi))  mod  M,  (9) 

where  each  hj  is  an  independent  hash  function.  This  idea,  introduced  by  J.  L. 
Carter  and  M.  N.  Wegman  in  1977,  is  especially  efficient  when  each  Xj  is  a single 
character,  because  we  can  then  use  a precomputed  array  for  each  hj.  Such  arrays 
make  multiplication  unnecessary.  If  M is  a power  of  2,  we  can  avoid  the  division 
in  (9)  by  substituting  exclusive-or  for  addition;  this  gives  a different,  but  equally 
good,  hash  function.  Therefore  (9)  certainly  satisfies  property  (a).  Moreover, 
Carter  and  Wegman  proved  that  if  the  hj  are  chosen  at  random,  property  (b) 
will  hold  regardless  of  the  input  data.  (See  exercise  72.) 
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Many  more  methods  for  hashing  have  been  suggested,  but  none  of  them 
have  proved  to  be  superior  to  the  simple  methods  described  above.  For  a survey 
of  several  approaches  together  with  detailed  statistics  on  their  performance  with 
actual  files,  see  the  article  by  V.  Y.  Lum,  P.  S.  T.  Yuen,  and  M.  Dodd,  CACM 
14  (1971),  228-239. 

Of  all  the  other  hash  methods  that  have  been  tried,  perhaps  the  most  in- 
teresting is  a technique  based  on  algebraic  coding  theory;  the  idea  is  analogous 
to  the  division  method  above,  but  we  divide  by  a polynomial  modulo  2 instead  of 
dividing  by  an  integer.  (As  observed  in  Section  4.6,  this  operation  is  analogous 
to  division,  just  as  addition  is  analogous  to  exclusive-or.)  For  this  method, 
M should  be  a power  of  2,  say  M - 2m,  and  we  make  use  of  an  mth  degree 
polynomial  P(x)  = xm  + + • • • + p0.  An  n-digit  binary  key  K = 

(kn-i.  ■ ■ h k0) 2 can  be  regarded  as  the  polynomial  K(x)  = ifcn_  1xn~1  + ■ ■ ■ + 
kix  + k0,  and  we  compute  the  remainder 

K(x)  mod  P(x)  = hrn^ixm~1  -f f hix  + h0 

using  polynomial  arithmetic  modulo  2;  then  h(K)  = (hm-\  ...  hi  h0)2.  If  P(x)  is 
chosen  properly,  this  hash  function  can  be  guaranteed  to  avoid  collisions  between 
nearly  equal  keys.  For  example  if  n = 15,  m = 10,  and 

P{x)  = + xs  + a:5  + a;4  + x?  + x + 1 , (io) 

it  can  be  shown  that  h(Ki)  will  be  unequal  to  h^K^)  whenever  K\  and 
are  distinct  keys  that  differ  in  fewer  than  seven  bit  positions.  (See  exercise  7 
for  further  information  about  this  scheme;  it  is,  of  course,  more  suitable  for 
hardware  or  microprogramming  implementation  than  for  software.) 

It  is  often  convenient  to  use  the  constant  hash  function  h(K)  = 0 when 
debugging  a program,  since  all  keys  will  be  stored  together;  an  efficient  h(K) 
can  be  substituted  later. 

Collision  resolution  by  “chaining.”  We  have  observed  that  some  hash 
addresses  will  probably  be  burdened  with  more  than  their  share  of  keys.  Perhaps 
the  most  obvious  way  to  solve  this  problem  is  to  maintain  M linked  lists,  one 
for  each  possible  hash  code.  A LINK  field  should  be  included  in  each  record, 
and  there  will  also  be  M list  heads,  numbered  say  from  1 through  M.  After 
hashing  the  key,  we  simply  do  a sequential  search  in  list  number  h(K)  + 1.  (See 
exercise  6.1-2.  The  situation  is  very  similar  to  multiple-list-insertion  sorting, 
Program  5.2. 1M.) 

Figure  38  illustrates  this  simple  chaining  scheme  when  M = 9,  for  the 
sequence  of  seven  keys 

K = EN,  TO,  TRE,  FIRE,  FEM,  SEKS,  SYV  (n) 

(the  numbers  1 through  7 in  Norwegian),  having  the  respective  hash  codes 

M*0  + l = 3,  1,  4,  1,  5,  9,  2.  (12) 

The  first  list  has  two  elements,  and  three  of  the  lists  are  empty. 
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HEAD  [1] 
HEAD  [2] 
HEAD  [3] 
HEAD [4] 
HEAD [5] 
HEAD  [6] 
HEAD  [7] 
HEAD  [8] 
HEAD  [9] 


Fig.  38.  Separate  chaining. 


Chaining  is  quite  fast,  because  the  lists  are  short.  If  365  people  are  gathered 
together  in  one  room,  there  will  probably  be  many  pairs  having  the  same  birth- 
day, but  the  average  number  of  people  with  any  given  birthday  will  be  only  1! 
In  general,  if  there  are  N keys  and  M lists,  the  average  list  size  is  N/M;  thus 
hashing  decreases  the  average  amount  of  work  needed  for  sequential  searching 
by  roughly  a factor  of  M.  (A  precise  formula  is  worked  out  in  exercise  34.) 

This  method  is  a straightforward  combination  of  techniques  we  have  dis- 
cussed before,  so  we  do  not  need  to  formulate  a detailed  algorithm  for  chained 
hash  tables.  It  is  often  a good  idea  to  keep  the  individual  lists  in  order  by  key, 
so  that  unsuccessful  searches  — which  must  precede  insertions  — go  faster.  Thus 
if  we  choose  to  make  the  lists  ascending,  the  TO  and  FIRE  nodes  of  Fig.  38  would 
be  interchanged,  and  all  the  A links  would  be  replaced  by  pointers  to  a dummy 
record  whose  key  is  oc.  (See  Algorithm  6. IT.)  Alternatively  we  could  make  use 
of  the  “self-organizing”  concept  discussed  in  Section  6.1;  instead  of  keeping  the 
lists  in  order  by  key,  they  may  be  kept  in  order  according  to  the  time  of  most 
recent  occurrence. 

For  the  sake  of  speed  we  would  like  to  make  M rather  large.  But  when  M is 
large,  many  of  the  lists  will  be  empty  and  much  of  the  space  for  the  M list  heads 
will  be  wasted.  This  suggests  another  approach,  when  the  records  are  small:  We 
can  overlap  the  record  storage  with  the  list  heads,  making  room  for  a total  of 
M records  and  M links  instead  of  for  N records  and  M + N links.  Sometimes 
it  is  possible  to  make  one  pass  over  all  the  data  to  find  out  which  list  heads  will 
be  used,  then  to  make  another  pass  inserting  all  the  “overflow”  records  into  the 
empty  slots.  But  this  is  often  impractical  or  impossible,  and  we’d  rather  have  a 
technique  that  processes  each  record  only  once  when  it  first  enters  the  system. 
The  following  algorithm,  due  to  F.  A.  Williams  [CACM  2,6  (June  1959),  21-24], 
is  a convenient  way  to  solve  the  problem. 

Algorithm  C ( Chained  hash  table  search  and  insertion).  This  algorithm  looks 
for  a given  key  K in  an  M- node  table.  If  K is  not  in  the  table  and  the  table  is 
not  full,  K is  inserted. 
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Fig.  39.  Chained  hash  table  search  and  insertion. 


The  nodes  of  the  table  are  denoted  by  TABLE  [i] , for  0 < i < M,  and  they 
are  of  two  distinguishable  types,  empty  and  occupied.  An  occupied  node  contains 
a key  field  KEY  [i] , a link  field  LINK  [i]  , and  possibly  other  fields. 

The  algorithm  makes  use  of  a hash  function  h(K).  An  auxiliary  variable 
R is  also  used,  to  help  find  empty  spaces;  when  the  table  is  empty,  we  have 
R = M + 1,  and  as  insertions  are  made  it  will  always  be  true  that  TABLE [j]  is 
occupied  for  all  j in  the  range  R < j < M.  By  convention,  TABLE  [0]  will  always 
be  empty. 

Cl.  [Hash.]  Set  i <-  h(K)  + 1.  (Now  1 <i<  M.) 

C2.  [Is  there  a list?]  If  TABLE  [i]  is  empty,  go  to  C6.  (Otherwise  TABLE  [i]  is 
occupied;  we  will  look  at  the  list  of  occupied  nodes  that  starts  here.) 

C3.  [Compare.]  If  K — KEY  [i]  , the  algorithm  terminates  successfully. 

C4.  [Advance  to  next.]  If  LINK  [i]  ^ 0,  set  i LINK  [i]  and  go  back  to  step  C3. 

C5.  [Find  empty  node.]  (The  search  was  unsuccessful,  and  we  want  to  find  an 
empty  position  in  the  table.)  Decrease  R one  or  more  times  until  finding 
a value  such  that  TABLE  [I?]  is  empty.  If  R = 0,  the  algorithm  terminates 
with  overflow  (there  are  no  empty  nodes  left);  otherwise  set  LINK  [j]  <-  R, 
i R. 

C6.  [Insert  new  key.]  Mark  TABLE  [j]  as  an  occupied  node,  with  KEY  [z]  K 
and  LINK  [i]  0.  | 

This  algorithm  allows  several  lists  to  coalesce,  so  that  records  need  not  be 
moved  after  they  have  been  inserted  into  the  table.  For  example,  see  Fig.  40, 
where  SEKS  appears  in  the  list  containing  TO  and  FIRE  since  the  latter  had  already 
been  inserted  into  position  9. 

In  order  to  see  how  Algorithm  C compares  with  others  in  this  chapter,  we  can 
write  the  following  MIX  program.  The  analysis  worked  out  below  indicates  that 
the  lists  of  occupied  cells  tend  to  be  short,  and  the  program  has  been  designed 
with  this  fact  in  mind. 
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TABLE [1] 
TABLE  [2] 
TABLE  [3] 
TABLE [4] 
TABLE  [5] 
TABLE  [6] 
TABLE  [7] 
TABLE  [8] 
TABLE  [9] 


Fig.  40.  Coalesced  chaining. 


Program  C ( Chained  hash  table  search  and  insertion).  For  convenience,  the 
keys  are  assumed  to  be  only  three  bytes  long,  and  nodes  are  represented  as 
follows: 


empty  node 
occupied  node 


- 

1 

0 

0 

0 

0 

+ 

LINK 

1 

1 1 

KEY 

1 1 

(i3) 


The  table  size  M is  assumed  to  be  prime;  TABLE  [i]  is  stored  in  location  TABLE +*. 
rll  = i,  rA  = K\  rI2  = LINK[«]  and/or  R. 


01 

KEY 

EQU 

3:5 

02 

LINK 

EQU 

0:2 

03 

START 

LDX 

K 

1 

Cl.  Hash. 

04 

ENTA 

0 

1 

05 

DIV 

=M= 

1 

06 

STX 

*+1(0:2) 

1 

01 

ENT1 

* 

1 

i <r-  h(K) 

08 

INC1 

1 

1 

+ 1. 

09 

LDA 

K 

1 

10 

LD2 

TABLE ,1 (LINK) 

1 

C2.  Is  there  a list? 

11 

J2N 

6F 

1 

To  C6  if  TABLE  [i]  empty. 

12 

CMPA 

TABLE, 1 (KEY) 

A 

C3.  Compare. 

13 

JE 

SUCCESS 

A 

Exit  if  K = KEY  [i]  . 

14 

J2Z 

5F 

A -SI 

To  C5  if  LINK  [i]  = 0. 

15 

4H 

ENT1 

0,2 

C-l 

C4.  Advance  to  next. 

16 

CMPA 

TABLE, 1 (KEY) 

C - 1 

C3.  Compare. 

17 

JE 

SUCCESS 

C-l 

Exit  if  K = KEY  [t]  . 

18 

LD2 

TABLE, 1 (LINK) 

C - 1 - S2 

19 

J2NZ 

4B 

C - 1 - S2 

Advance  if  LINK  [i]  A 0. 

20 

5H 

LD2 

R 

A-S 

C5.  Find  empty  node. 

21 

DEC2 

1 

T 

R i — R — 1. 

22 

LDX 

TABLE, 2 

T 

23 

JXNN 

*-2 

T 

Repeat  until  TABLE  [it]  empty. 

24 

J2Z 

OVERFLOW 

A-S 

Exit  if  no  empty  nodes  left. 

25 

ST2 

TABLE, 1 (LINK) 

A-S 

LINK  [i]  <r-  R. 

26 

ENT1 

0,2 

A-S 

i 4 — it. 
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27 

ST2 

R 

A — S 

Update  R in  memory. 

28 

6H  STZ 

TABLE, 1 (LINK) 

1-5 

C6.  Insert  new  key.  LINK  [i]  <—  0. 

29 

STA 

TABLE, 1 (KEY) 

1-5 

KEY  [i]  <-  K.  | 

The  running  time  of  this  program  depends  on 
C = number  of  table  entries  probed  while  searching; 

A = [initial  probe  found  an  occupied  node]; 

5 = [search  was  successful]; 

T = number  of  table  entries  probed  while  looking  for  an  empty  space. 

Here  5 = 51  + 5 2,  where  51  = 1 if  successful  on  the  first  try.  The  total  running 
time  for  the  searching  phase  of  Program  C is  (7 C + 4 A + 17  — 35  + 251)  u,  and 
the  insertion  of  a new  key  when  5 = 0 takes  an  additional  (8/1  + 4T  + 4)u. 

Suppose  there  are  N keys  in  the  table  at  the  start  of  this  program,  and  let 

a = N/M  = load  factor  of  the  table.  (14) 


Then  the  average  value  of  A in  an  unsuccessful  search  is  obviously  a,  if  the  hash 
function  is  random;  and  exercise  39  proves  that  the  average  value  of  C in  an 
unsuccessful  search  is 


Thus  when  the  table  is  half  full,  the  average  number  of  probes  made  in  an 
unsuccessful  search  is  about  |(e  + 2)  « 1.18;  and  even  when  the  table  gets 
completely  full,  the  average  number  of  probes  made  just  before  inserting  the 
final  item  will  be  only  about  |(e2  + 1)  ~ 2.10.  The  standard  deviation  is  also 
small,  as  shown  in  exercise  40.  These  statistics  prove  that  the  lists  stay  short 
even  though  the  algorithm  occasionally  allows  them  to  coalesce,  when  the  hash 
function  is  random.  Of  course  C can  be  as  high  as  N,  if  the  hash  function  is  bad 
or  if  we  are  extremely  unlucky. 

In  a successful  search,  we  always  have  A = 1.  The  average  number  of  probes 
during  a successful  search  may  be  computed  by  summing  the  quantity  C + A 
over  the  first  N unsuccessful  searches  and  dividing  by  N,  if  we  assume  that  each 
key  is  equally  likely.  Thus  we  obtain 


0<k<N 


2 N\  IN -l 

M ) + 4 M 


„2a 


1 + 


- 1 - 2a 


8a 


a 
+ 4 


(16) 


as  the  average  number  of  probes  in  a random  successful  search.  Even  a full  table 
will  require  only  about  1.80  probes,  on  the  average,  to  find  an  item!  Similarly 
(see  exercise  42),  the  average  value  of  51  turns  out  to  be 


51,v  = l-i((Ar-l)/M)*l  — §a.  (17) 


At  first  glance  it  may  appear  that  step  C5  is  inefficient,  since  it  has  to  search 
sequentially  for  an  empty  position.  But  actually  the  total  number  of  table  probes 
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made  in  step  C5  as  a table  is  being  built  will  never  exceed  the  number  of  items 
in  the  table;  so  we  make  an  average  of  at  most  one  of  these  probes  per  insertion. 
Exercise  41  proves  that  T is  approximately  aea  in  a random  unsuccessful  search. 

It  would  be  possible  to  modify  Algorithm  C so  that  no  two  lists  coalesce,  but 
then  it  would  become  necessary  to  move  records  around.  For  example,  consider 
the  situation  in  Fig.  40  just  before  we  wanted  to  insert  SEKS  into  position  9; 
in  order  to  keep  the  lists  separate,  it  would  be  necessary  to  move  FIRE,  and 
for  this  purpose  it  would  be  necessary  to  discover  which  node  points  to  FIRE. 
We  could  solve  this  problem  without  providing  two-way  linkage  by  hashing  FIRE 
and  searching  down  its  list,  as  suggested  by  D.  E.  Ferguson,  since  the  lists  are 
short.  Exercise  34  shows  that  the  average  number  of  probes,  when  lists  aren’t 
coalesced,  is  reduced  to 

o? 

wl  + y (unsuccessful  search),  (18) 

Ct 

~ 1 + — (successful  search).  (19) 

This  is  not  enough  of  an  improvement  over  (15)  and  (16)  to  warrant  changing 
the  algorithm. 

On  the  other  hand,  Butler  Lampson  has  observed  that  most  of  the  space  that 
is  occupied  by  links  can  actually  be  saved  in  the  chaining  method,  if  we  avoid 
coalescing  the  lists.  This  leads  to  an  interesting  algorithm  that  is  discussed  in 
exercise  13.  Lampson’s  method  introduces  a tag  bit  in  each  entry,  and  causes  the 
average  number  of  probes  needed  in  an  unsuccessful  search  to  decrease  slightly, 
from  (18)  to 


Separate  chaining  as  in  Fig.  38  can  be  used  when  N > M,  so  overflow  is 
not  a serious  problem  in  that  case.  When  the  lists  coalesce  as  in  Fig.  40  and 
Algorithm  C,  we  can  link  extra  items  into  an  auxiliary  storage  pool;  L.  Guibas 
has  proved  that  the  average  number  of  probes  to  insert  the  (M  + L + l)st  item 
is  then  [L/2M  + ~)  ((1  + 2 /M)m  — l)  + |.  However,  it  is  usually  preferable  to 
use  an  alternative  scheme  that  puts  the  first  colliding  elements  into  an  auxiliary 
storage  area,  allowing  lists  to  coalesce  only  when  this  auxiliary  area  has  filled 
up;  see  exercise  43. 

Collision  resolution  by  “open  addressing.”  Another  way  to  resolve  the 
problem  of  collisions  is  to  do  away  with  links  entirely,  simply  looking  at  various 
entries  of  the  table  one  by  one  until  either  finding  the  key  K or  finding  an  empty 
position.  The  idea  is  to  formulate  some  rule  by  which  every  key  K determines  a 
“probe  sequence,”  namely  a sequence  of  table  positions  that  are  to  be  inspected 
whenever  K is  inserted  or  looked  up.  If  we  encounter  an  empty  position  while 
searching  for  K,  using  the  probe  sequence  determined  by  K,  we  can  conclude 
that  K is  not  in  the  table,  since  the  same  sequence  of  probes  will  be  made  every 


C'N  — 1 + 


Cn  — 1 + 


N(N  — 1) 
2 M2 
N -1 


2 M 
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time  K is  processed.  This  general  class  of  methods  was  named  open  addressing 
by  W.  W.  Peterson  [IBM  J.  Research  & Development  1 (1957),  130-146]. 

The  simplest  open  addressing  scheme,  known  as  linear  probing,  uses  the 
cyclic  probe  sequence 

h(K),  h(K)  - 1,  . . . , 0,  M - 1,  M - 2,  . . . , h(K)  + 1 (20) 

as  in  the  following  algorithm. 

Algorithm  L (Linear  probing  and  insertion).  This  algorithm  searches  an  M- 
node  table,  looking  for  a given  key  K.  If  K is  not  in  the  table  and  the  table  is 
not  full,  K is  inserted. 

The  nodes  of  the  table  are  denoted  by  TABLE  [?] , for  0 < i < M,  and  they 
are  of  two  distinguishable  types,  empty  and  occupied.  An  occupied  node  contains 
a key,  called  KEY  [/'] , and  possibly  other  fields.  An  auxiliary  variable  N is  used 
to  keep  track  of  how  many  nodes  are  occupied;  this  variable  is  considered  to  be 
part  of  the  table,  and  it  is  increased  by  1 whenever  a new  key  is  inserted. 

This  algorithm  makes  use  of  a hash  function  h(K),  and  it  uses  the  linear 
probing  sequence  (20)  to  address  the  table.  Modifications  of  that  sequence  are 
discussed  below. 

Ll.  [Hash.]  Set  i t—  h(K).  (Now  0 < i < M.) 

L2.  [Compare.]  If  TABLE  [i]  is  empty,  go  to  step  L4.  Otherwise  if  KEY  [;]  = K , 
the  algorithm  terminates  successfully. 

L3.  [Advance  to  next.]  Set  i «—  i — 1;  if  now  * < 0,  set  i * + M.  Go  back  to 
step  L2. 

L4.  [Insert.]  (The  search  was  unsuccessful.)  If  N = M — 1,  the  algorithm 
terminates  with  overflow.  (This  algorithm  considers  the  table  to  be  full 
when  N = M — 1,  not  when  N = M;  see  exercise  15.)  Otherwise  set 
N <-  N + 1,  mark  TABLE  [i]  occupied,  and  set  KEY[i]  4-  K.  | 

Figure  41  shows  what  happens  when  the  seven  example  keys  (11)  are  inserted 
by  Algorithm  L,  using  the  respective  hash  codes  2,  7,  1,  8,  2,  8,  1:  The  last  three 
keys,  FEM,  SEKS,  and  SYV,  have  been  displaced  from  their  initial  locations  h(K). 


0 

1 

2 

3 

4 

5 

6 

7 

8 


FEM 

TRE 

EN 


SYV 

SEKS 

TO 

FIRE 


Fig.  41.  Linear  open  addressing. 
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Program  L ( Linear  probing  and  insertion).  This  program  deals  with  full- word 
keys;  but  a key  of  0 is  not  allowed,  since  0 is  used  to  signal  an  empty  position 
in  the  table.  (Alternatively,  we  could  require  the  keys  to  be  nonnegative,  letting 
empty  positions  contain  —1.)  The  table  size  M is  assumed  to  be  prime,  and 
TABLE  [i]  is  stored  in  location  TABLE  + * for  0 < i < M.  For  speed  in  the  inner 
loop,  location  TABLE  — 1 is  assumed  to  contain  0.  Location  VACANCIES  is  assumed 
to  contain  the  value  M — 1 — N;  and  rA  = K,  rll  = i. 

In  order  to  speed  up  the  inner  loop  of  this  program,  the  test  “i  < 0”  has  been 
removed  from  the  loop  so  that  only  the  essential  parts  of  steps  L2  and  L3  remain. 
The  total  running  time  for  the  searching  phase  comes  to  (7 C + 9 E + 21  — 4 S)u, 
and  the  insertion  after  an  unsuccessful  search  adds  an  extra  8 u. 


01 

START 

LDX 

K 

1 

LI.  Hash. 

02 

ENTA 

0 

1 

03 

DIV 

=M= 

1 

04 

STX 

*+1(0:2) 

1 

05 

ENT1 

* 

1 

i «-  h(K). 

06 

LDA 

K 

1 

07 

JMP 

2F 

1 

08 

8H 

INC1 

M+l 

E 

L3.  Advance  to  next. 

09 

3H 

DEC1 

1 

C + E-l 

i <—  i — 1. 

10 

2H 

CMPA 

TABLE, 1 

C + E 

L2.  Compare. 

11 

JE 

SUCCESS 

C + E 

Exit  if  K = KEY  [?] . 

12 

LDX 

TABLE, 1 

C + E-S 

13 

JXNZ 

3B 

C + E-S 

To  L3  if  TABLE  [i]  nonempty. 

U 

JIN 

8B 

E + l-S 

To  L3  with  i «—  M if  i = - 

-1. 

15 

4H 

LDX 

VACANCIES 

1 - S 

L4.  Insert. 

16 

JXZ 

OVERFLOW 

1 - S 

Exit  with  overflow  if  N = 

M - 

17 

DECX 

1 

1 - S 

18 

STX 

VACANCIES 

1 - S 

Increase  IV  by  1. 

19 

STA 

TABLE, 1 

1 -S 

TABLE  [i]  «—  K.  | 

As  in  Program  C,  the  variable  C denotes  the  number  of  probes,  and  S tells 
whether  or  not  the  search  was  successful.  We  may  ignore  the  variable  E,  which 
is  1 only  if  a spurious  probe  of  TABLE  [—1]  has  been  made,  since  its  average  value 
is  (C  — 1)/M. 

Experience  with  linear  probing  shows  that  the  algorithm  works  fine  until 
the  table  begins  to  get  full;  but  eventually  the  process  slows  down,  with  long 
drawn-out  searches  becoming  increasingly  frequent.  The  reason  for  this  behavior 
can  be  understood  by  considering  the  following  hypothetical  hash  table  in  which 
M = 19  and  N — 9: 


0 1 2 3 4 5 6 7 8 9 10  11  12  13  14  15  16  17  18 


z 


ZEE 


(21) 


Shaded  squares  represent  occupied  positions.  The  next  key  K to  be  inserted 
into  the  table  will  go  into  one  of  the  ten  empty  spaces,  but  these  are  not  equally 
likely;  in  fact,  K will  be  inserted  into  position  11  if  11  < h(K)  < 15,  while  it 
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will  fall  into  position  8 only  if  h(K)  — 8.  Therefore  position  11  is  five  times  as 
likely  as  position  8;  long  lists  tend  to  grow  even  longer. 

This  phenomenon  isn’t  enough  by  itself  to  account  for  the  relatively  poor 
behavior  of  linear  probing,  since  a similar  thing  occurs  in  Algorithm  C.  (A  list 
of  length  4 is  four  times  as  likely  to  grow  in  Algorithm  C as  a list  of  length  1.) 
The  real  problem  occurs  when  a cell  like  4 or  16  becomes  occupied  in  (21);  then 
two  separate  lists  are  combined,  while  the  lists  in  Algorithm  C never  grow  by 
more  than  one  step  at  a time.  Consequently  the  performance  of  linear  probing 
degrades  rapidly  when  N approaches  M. 

We  shall  prove  later  in  this  section  that  the  average  number  of  probes  needed 
by  Algorithm  L is  approximately 

C'n  ~ 2 + ^ ^ (unsuccessful  search),  (22) 

Cjv  ~ ^ f 1 + 'j  (successful  search),  (23) 

where  a = N/M  is  the  load  factor  of  the  table.  Therefore  Program  L is  almost 
as  fast  as  Program  C,  when  the  table  is  less  than  75  percent  full,  in  spite  of  the 
fact  that  Program  C deals  with  unrealistically  short  keys.  On  the  other  hand, 
when  a approaches  1 the  best  thing  we  can  say  about  Program  L is  that  it  works, 
slowly  but  surely.  In  fact,  when  N — M — 1,  there  is  only  one  vacant  space  in  the 
table,  so  the  average  number  of  probes  in  an  unsuccessful  search  is  (M  + l)/2; 
we  shall  also  prove  that  the  average  number  of  probes  in  a successful  search  is 
approximately  yj ttM/8  when  the  table  is  full. 

The  pileup  phenomenon  that  makes  linear  probing  costly  on  a nearly  full 
table  is  aggravated  by  the  use  of  division  hashing,  if  consecutive  key  values 
{K,  K+ 1,  K+ 2, . . .}  are  likely  to  occur,  since  these  keys  will  have  consecutive 
hash  codes.  Multiplicative  hashing  will  break  up  these  clusters  satisfactorily. 

Another  way  to  protect  against  the  consecutive  hash  code  problem  is  to  set 
i t—  i — c in  step  L3,  instead  of  i 4-  i — 1.  Any  positive  value  of  c will  do,  so 
long  as  it  is  relatively  prime  to  M,  since  the  probe  sequence  will  still  examine 
every  position  of  the  table  in  this  case.  Such  a change  would  make  Program  L a 
bit  slower,  because  of  the  test  for  * < 0.  Decreasing  by  c instead  of  by  1 won’t 
alter  the  pileup  phenomenon,  since  groups  of  c-apart  records  will  still  be  formed; 
equations  (22)  and  (23)  will  still  apply.  But  the  appearance  of  consecutive  keys 
{K,  K+ 1,  K+  2, . . .}  will  now  actually  be  a help  instead  of  a hindrance. 

Although  a fixed  value  of  c does  not  reduce  the  pileup  phenomenon,  we 
can  improve  the  situation  nicely  by  letting  c depend  on  K.  This  idea  leads  to 
an  important  modification  of  Algorithm  L,  first  introduced  by  Guy  de  Balbine 
[Ph.D.  thesis,  Calif.  Inst,  of  Technology  (1968),  149-150]: 

Algorithm  D ( Open  addressing  with  double  hashing).  This  algorithm  is  almost 
identical  to  Algorithm  L,  but  it  probes  the  table  in  a slightly  different  fashion  by 
making  use  of  two  hash  functions  hi(K)  and  h^K).  As  usual  h\(K)  produces  a 
value  between  0 and  M — 1,  inclusive;  but  li2(K)  must  produce  a value  between 
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1 and  M — 1 that  is  relatively  prime  to  M.  (For  example,  if  M is  prime,  h2(K) 
can  be  any  value  between  1 and  M — 1 inclusive;  or  if  M = 2m,  h2(K)  can  be 
any  odd  value  between  1 and  2m  — 1.) 

Dl.  [First  hash.]  Set  i 4—  hi(K). 

D2.  [First  probe.]  If  TABLE  [z]  is  empty,  go  to  D6.  Otherwise  if  KEY  [z]  = K,  the 
algorithm  terminates  successfully. 

D3.  [Second  hash.]  Set  c 4—  h2(K). 

D4.  [Advance  to  next.]  Set  * 4-  * — c;  if  now  i < 0,  set  i 4-  i + M. 

D5.  [Compare.]  If  TABLE  [z]  is  empty,  go  to  D6.  Otherwise  if  KEY  [z]  = K,  the 
algorithm  terminates  successfully.  Otherwise  go  back  to  D4. 

D6.  [Insert.]  If  TV  = M — 1,  the  algorithm  terminates  with  overflow.  Otherwise 
set  N <r-  N + 1,  mark  TABLE  [z]  occupied,  and  set  KEY[z]  4—  K.  | 


Several  possibilities  have  been  suggested  for  computing  h2(K).  If  M is 
prime  and  hi(K)  — K mod  M,  we  might  let  h2(K)  = 1 + (K  mod  (M  — 1));  but 
since  M — 1 is  even,  it  would  be  better  to  let  h2{K)  = 1 + (K  mod  (M  — 2)). 
This  suggests  choosing  M so  that  M and  M — 2 are  “twin  primes”  like  1021 
and  1019.  Alternatively,  we  could  set  h2(K)  = 1 + [[K/M \ mod  (M  — 2)), 
since  the  quotient  [ K/M\  might  be  available  in  a register  as  a by-product  of  the 
computation  of  hi(K). 

If  M = 2m  and  we  are  using  multiplicative  hashing,  h2(K)  can  be  computed 
simply  by  shifting  left  m more  bits  and  “oring  in”  a 1,  so  that  the  coding  sequence 
in  (5)  would  be  followed  by 

ENT  A 0 Clear  rA. 

SLB  m Shift  rAX  m bits  left.  (24) 

OR  =1=  rA  4-  rA  | 1. 


This  is  faster  than  the  division  method. 

In  each  of  the  techniques  suggested  above,  hi(K)  and  h2(K)  are  essentially 
independent,  in  the  sense  that  different  keys  will  yield  the  same  values  for  both  hi 
and  h2  with  probability  approximately  proportional  to  1 /M2  instead  of  to  1 /M. 
Empirical  tests  show  that  the  behavior  of  Algorithm  D with  independent  hash 
functions  is  essentially  indistinguishable  from  the  number  of  probes  that  would 
be  required  if  the  keys  were  inserted  at  random  into  the  table;  there  is  practically 
no  “piling  up”  or  “clustering”  as  in  Algorithm  L. 

It  is  also  possible  to  let  h2(K ) depend  on  hi(K),  as  suggested  by  Gary  Knott 
in  1968;  for  example,  if  M is  prime  we  could  let 


h2(K) 


1,  if  hi(K)  = 0; 

M-hi(K),  if  h\(K)  > 0. 


(25) 


This  would  be  faster  than  doing  another  division,  but  we  shall  see  that  it  does 
cause  a certain  amount  of  secondary  clustering , requiring  slightly  more  probes 
because  of  the  increased  chance  that  two  or  more  keys  will  follow  the  same  path. 
The  formulas  derived  below  can  be  used  to  determine  whether  the  gain  in  hashing 
time  outweighs  the  loss  of  probing  time. 
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Algorithms  L and  D are  very  similar,  yet  there  are  enough  differences  that 
it  is  instructive  to  compare  the  running  time  of  the  corresponding  MIX  programs. 

Program  D ( Open  addressing  with  double  hashing).  Since  this  program  is 
substantially  like  Program  L,  it  is  presented  without  comments.  rI2  = c — 1. 


01 

START  LDX 

K 

1 

15 

3H 

DEC1 

1,2 

C-  1 

02 

ENTA 

0 

1 

16 

JINN 

*+2 

C-  1 

03 

DIV 

=M= 

1 

17 

INC1 

M 

B 

04 

STX 

*+1(0:2) 

1 

18 

CMPA 

TABLE, 1 

C - 1 

05 

ENT1 

* 

1 

19 

JE 

SUCCESS 

C-  1 

06 

LDX 

TABLE, 1 

1 

20 

LDX 

TABLE, 1 

C - 1 - 52 

07 

CMPX 

K 

1 

21 

JXNZ 

3B 

C - 1 - 52 

08 

JE 

SUCCESS 

1 

22 

4H 

LDX 

VACANCIES 

1 - 5 

09 

JXZ 

4F 

1 - 51 

23 

JXZ 

OVERFLOW 

1 - 5 

10 

SRAX 

5 

A -51 

24 

DECX 

1 

1 - 5 

11 

DIV 

=M-2= 

A -51 

25 

STX 

VACANCIES 

1-5 

12 

STX 

*+1(0:2) 

A -51 

26 

LDA 

K 

1-5 

13 

ENT2 

* 

A -51 

27 

STA 

TABLE, 1 

1-5  | 

U 

LDA 

K 

A -51 

The  frequency  counts  A,  C,  51,  52  in  this  program  have  a similar  interpretation 
to  those  in  Program  C above.  The  other  variable  B will  be  about  (C- 1) /2  on  the 
average.  (If  we  restricted  the  range  of  h2(K)  to,  say,  1 < h2(K)  < M/2,  B would 
be  only  about  ( C - 1) / 4;  this  increase  of  speed  will  probably  not  be  offset  by  a 
noticeable  increase  in  the  number  of  probes.)  When  there  are  N = a M keys  in 
the  table,  the  average  value  of  A is,  of  course,  a in  an  unsuccessful  search,  and 
A — 1 in  a successful  search.  As  in  Algorithm  C,  the  average  value  of  51  in  a 
successful  search  is  1 — | ((IV  — 1 ) / M ) ps  1 — |a.  The  average  number  of  probes 
is  difficult  to  determine  exactly,  but  empirical  tests  show  good  agreement  with 
formulas  derived  below  for  “uniform  probing,”  namely 

M"  T 1 , 

— ~tv  , — T7  ~ (1  — a)  (unsuccessful  search),  (26) 


CN 


M+ 1 
N 


1ln(l  — a)  (successful  search),  (27) 


when  hi(K)  and  h2(K)  are  independent.  When  h2(K)  depends  on  h\ (K)  as 
in  (25),  the  secondary  clustering  causes  (26)  and  (27)  to  be  increased  to 


C'N  = 


M+ 1 
M+l  — N 


j+Hm+i-Hm+i-n+0(M  x) 

~ (1—  a)-1—  a — ln(l  — a); 


(28) 


CN  - 1+Hm+i—Hm+i-n—  2{m+\)  ~(Hm+i~Hm+i-N)/N+0(N  x) 

ps  l-ln(l-a)-|a.  (29) 

(See  exercise  44.)  Note  that  as  the  table  gets  full,  these  values  of  Cn  approach 
Hm+ 1 - 1 and  HM+l  - |,  respectively,  when  N = M;  this  is  much  better  than 
we  observed  in  Algorithm  L,  but  not  as  good  as  in  the  chaining  methods. 
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Load  factor,  a = N/M 

Fig.  42.  The  running  time  for  successful  searching  by  three  open  addressing  schemes. 

Since  each  probe  takes  slightly  less  time  in  Algorithm  L,  double  hashing 
is  advantageous  only  when  the  table  gets  full.  Figure  42  compares  the  average 
running  time  of  Program  L,  Program  D,  and  a modified  Program  D that  involves 
secondary  clustering,  replacing  the  rather  slow  calculation  of  h,2(K)  in  lines  10-13 
by  the  following  three  instructions: 

ENN2  1-M,  1 c <—  M — i. 

J1NZ  *+2  (30) 

ENT2  0 If  i = 0,  c <-  1. 

Program  D takes  a total  of  8 C + 19A  + B + 26  — 135  - 1751  units  of  time; 
modification  (30)  saves  about  15(A  — 51)  « 7.5a  of  these  in  a successful  search. 
In  this  case,  secondary  clustering  is  preferable  to  independent  double  hashing. 

On  a binary  computer,  we  could  speed  up  the  computation  of  h^K)  in 
another  way,  if  M is  prime  greater  than,  say,  512,  replacing  lines  10-13  by 

AND  =511=  rA  <—  rA  mod  512. 

STA  *+1(0:2)  (31) 

ENT2  * c-f-rA  + l. 

This  idea  (suggested  by  Bell  and  Kaman,  CACM  13  (1970),  675-677,  who 
discovered  Algorithm  D independently)  avoids  secondary  clustering  without  the 
expense  of  another  division. 

Many  other  probe  sequences  have  been  proposed  as  improvements  on  Algo- 
rithm L,  but  none  seem  to  be  superior  to  Algorithm  D except  possibly  the  method 
described  in  exercise  20. 

By  using  the  relative  order  of  keys  we  can  reduce  the  average  running 
time  for  unsuccessful  searches  by  Algorithms  L or  D to  the  average  running 
time  for  successful  search;  see  exercise  66.  This  technique  can  be  important  in 
applications  for  which  unsuccessful  searches  are  common;  for  example,  TJ7X  uses 
such  an  algorithm  when  looking  for  exceptions  to  its  hyphenation  rules. 
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Fig.  43.  The  number  of  times  a compiler  typically  searches  for  variable  names.  The 
names  are  listed  from  left  to  right  in  order  of  their  first  appearance. 

Brent’s  Variation.  Richard  P.  Brent  has  discovered  a way  to  modify  Algo- 
rithm D so  that  the  average  successful  search  time  remains  bounded  as  the  table 
gets  full.  His  method  [CACM  16  (1973),  105-109]  is  based  on  the  fact  that 
successful  searches  are  much  more  common  than  insertions,  in  many  applications; 
therefore  he  proposes  doing  more  work  when  inserting  an  item,  moving  records 
in  order  to  reduce  the  expected  retrieval  time. 

For  example,  Fig.  43  shows  the  number  of  times  each  identifier  was  actually 
found  to  appear,  in  a typical  PL/I  procedure.  This  data  indicates  that  a PL/I 
compiler  that  uses  a hash  table  to  keep  track  of  variable  names  will  be  looking  up 
many  of  the  names  five  or  more  times  but  inserting  them  only  once.  Similarly, 
Bell  and  Kaman  found  that  a COBOL  compiler  used  its  symbol  table  algorithm 
10988  times  while  compiling  a program,  but  made  only  735  insertions  into  the 
table;  this  is  an  average  of  about  14  successful  searches  per  unsuccessful  search. 
Sometimes  a table  is  actually  created  only  once  (for  example,  a table  of  symbolic 
opcodes  in  an  assembler),  and  it  is  used  thereafter  purely  for  retrieval. 

Brent’s  idea  is  to  change  the  insertion  process  in  Algorithm  D as  follows. 
Suppose  an  unsuccessful  search  has  probed  locations  p0,  pi,  . . . , Pt-i,  Pt , where 
Pj  = (h\(K)  — jh2(K)'j  mod  M and  TABLE  [pt]  is  empty.  If  t < 1,  we  insert  K in 
position  pt  as  usual;  but  if  t > 2,  we  compute  c0  = h2(K0),  where  K0  = KEY[p0] , 
and  see  if  TABLE  [(p0  — Co)  mod  M]  is  empty.  If  it  is,  we  set  it  to  TABLE  [p0]  and 
then  insert  K in  position  p0.  This  increases  the  retrieval  time  for  K0  by  one  step, 
but  it  decreases  the  retrieval  time  for  K by  t > 2 steps,  so  it  results  in  a net 
improvement.  Similarly,  if  TABLE  [(p0  - c0)  mod  M]  is  occupied  and  t > 3,  we 
try  TABLE[(p0  — 2co)  mod  M] ; if  that  is  full  too,  we  compute  c\  = h2(KEY [pi] ) 
and  try  TABLE  [(pi  — Ci)  mod  M]  ; etc.  In  general,  let  Cj  — h2(KEY  [p^] ) and 
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Pj,k  = (Pj  ~ kcj)  mod  M;  if  we  have  found  TABLE  [pj^]  occupied  for  all  indices  j 
and  k such  that  j+k  < r,  and  if  t > r+1,  we  look  at  TABLE  [po.rl , TABLE  [pir_i]  , 

TABLE [pr-i,i3-  If  the  first  empty  space  occurs  at  position  Pj,r-j  we  set 
TABLE  4—  TABLE  [pj]  and  insert  K in  position  pj. 

Brent’s  analysis  indicates  that  the  average  number  of  probes  per  successful 
search  is  reduced  to  the  levels  shown  in  Fig.  44,  on  page  545,  with  a maximum 
value  of  about  2.49. 

The  number  i+1  of  probes  in  an  unsuccessful  search  is  not  reduced  by  Brent’s 
variation;  it  remains  at  the  level  indicated  by  Eq.  (26),  approaching  \{M  + 1) 
as  the  table  gets  full.  The  average  number  of  times  /12  needs  to  be  computed 
per  insertion  is  a2  + a5  + | a6  + • • • , according  to  Brent’s  analysis,  eventually 
approaching  ©(\/Af);  and  the  number  of  additional  table  positions  probed  while 
deciding  how  to  make  the  insertion  is  about  a2  + a4  + |a5  + a6  H . 

E.  G.  Mallach  [Comp.  J.  20  (1977),  137-140]  has  experimented  with  refine- 
ments of  Brent’s  variation,  and  further  results  have  been  obtained  by  Gaston  H. 
Gonnet  and  J.  Ian  Munro  [SICOMP  8 (1979),  463-478]. 

Deletions.  Many  computer  programmers  have  great  faith  in  algorithms,  and 
they  are  surprised  to  find  that  the  obvious  way  to  delete  records  from  a hash 
table  doesn’t  work.  For  example,  if  we  try  to  delete  the  key  EN  from  Fig.  41, 
we  can’t  simply  mark  that  table  position  empty,  because  another  key  FEM  would 
suddenly  be  forgotten!  (Recall  that  EN  and  FEM  both  hashed  to  the  same  location. 
When  looking  up  FEM,  we  would  find  an  empty  place,  indicating  an  unsuccessful 
search.)  A similar  problem  occurs  with  Algorithm  C,  due  to  the  coalescing  of 
lists;  imagine  the  deletion  of  both  TO  and  FIRE  from  Fig.  40. 

In  general,  we  can  handle  deletions  by  putting  a special  code  value  in  the 
corresponding  cell,  so  that  there  are  three  kinds  of  table  entries:  empty,  occupied, 
and  deleted.  When  searching  for  a key,  we  should  skip  over  deleted  cells,  as  if 
they  were  occupied.  If  the  search  is  unsuccessful,  the  key  can  be  inserted  in  place 
of  the  first  deleted  or  empty  position  that  was  encountered. 

But  this  idea  is  workable  only  when  deletions  are  very  rare,  because  the 
entries  of  the  table  never  become  empty  again  once  they  have  been  occupied. 
After  a long  sequence  of  repeated  insertions  and  deletions,  all  of  the  empty  spaces 
will  eventually  disappear,  and  every  unsuccessful  search  will  take  M probes! 
Furthermore  the  time  per  probe  will  be  increased,  since  we  will  have  to  test 
whether  i has  returned  to  its  starting  value  in  step  D4;  and  the  number  of 
probes  in  a successful  search  will  drift  upward  from  CN  to  C'N. 

When  linear  probing  is  being  used  (Algorithm  L),  we  can  make  deletions  in 
a way  that  avoids  such  a sorry  state  of  affairs,  if  we  are  willing  to  do  some  extra 
work  for  the  deletion. 

Algorithm  R ( Deletion  with  linear  probing ).  Assuming  that  an  open  hash  table 
has  been  constructed  by  Algorithm  L,  this  algorithm  deletes  the  record  from  a 
given  position  TABLE  li ] . 

Rl.  [Empty  a cell.]  Mark  TABLE  [i]  empty,  and  set  j 4—  i. 
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R2.  [Decrease  *.]  Set  i 4—  i — 1,  and  if  this  makes  i negative  set  i 4-  * + M. 

R3.  [Inspect  TABLE  [i] .]  If  TABLE  [i]  is  empty,  the  algorithm  terminates.  Other- 
wise set  r 4—  h(KEY  [<] ).  the  original  hash  address  of  the  key  now  stored  at 
position  i.  If  i < r < j or  if  r < j < i or  j < i < r (in  other  words,  if  r lies 
cyclically  between  i and  j),  go  back  to  R2. 

R4.  [Move  a record.]  Set  TABLE  [j]  4—  TABLED]  , and  return  to  step  Rl.  | 

Exercise  22  shows  that  this  algorithm  causes  no  degradation  in  performance; 
in  other  words,  the  average  number  of  probes  predicted  in  Eqs.  (22)  and  (23) 
will  remain  the  same.  (A  weaker  result  for  tree  insertion  was  proved  in  Theorem 
6.2.2H.)  But  the  validity  of  Algorithm  R depends  heavily  on  the  fact  that  linear 
probing  is  involved,  and  no  analogous  deletion  procedure  for  use  with  Algorithm 
D is  possible.  The  average  running  time  of  Algorithm  R is  analyzed  in  exercise  64. 

Of  course  when  chaining  is  used  with  separate  lists  for  each  possible  hash 
value,  deletion  causes  no  problems  since  it  is  simply  a deletion  from  a linked 
linear  list.  Deletion  with  Algorithm  C is  discussed  in  exercise  23. 

Algorithm  R may  move  some  of  the  table  entries,  and  this  is  undesirable 
if  they  are  being  pointed  to  from  elsewhere.  Another  approach  to  deletions  is 
possible  by  adapting  some  of  the  ideas  used  in  garbage  collection  (see  Section 
2.3.5):  We  might  keep  a reference  count  with  each  key  telling  how  many  other 
keys  collide  with  it;  then  it  is  possible  to  convert  unoccupied  cells  to  empty  status 
when  their  reference  drops  to  zero.  Alternatively  we  might  go  through  the  entire 
table  whenever  too  many  deleted  entries  have  accumulated,  changing  all  the 
unoccupied  positions  to  empty  and  then  looking  up  all  remaining  keys,  in  order 
to  see  which  unoccupied  positions  still  require  “deleted”  status.  These  proce- 
dures, which  avoid  relocation  and  work  with  any  hash  technique,  were  originally 
suggested  by  T.  Gunji  and  E.  Goto  [J.  Information  Proc.  3 (1980),  1-12], 

* Analysis  of  the  algorithms.  It  is  especially  important  to  know  the  average 
behavior  of  a hashing  method,  because  we  are  committed  to  trusting  in  the 
laws  of  probability  whenever  we  hash.  The  worst  case  of  these  algorithms  is 
almost  unthinkably  bad,  so  we  need  to  be  reassured  that  the  average  behavior 
is  very  good. 

Before  we  get  into  the  analysis  of  linear  probing,  etc.,  let  us  consider  an 
approximate  model  of  the  situation,  called  uniform  probing.  In  this  model,  which 
was  suggested  by  W.  W.  Peterson  [IBM  J.  Research  & Devel.  1 (1957),  135-136], 
we  assume  that  each  key  is  placed  in  a completely  random  location  of  the  table,  so 
that  each  of  the  (^)  possible  configurations  of  N occupied  cells  and  M—N  empty 
cells  is  equally  likely.  This  model  ignores  any  effect  of  primary  or  secondary 
clustering;  the  occupancy  of  each  cell  in  the  table  is  essentially  independent  of 
all  the  others.  Then  the  probability  that  any  permutation  of  table  positions  needs 
exactly  r probes  to  insert  the  (N  + l)st  item  is  the  number  of  configurations  in 
which  r — 1 given  cells  are  occupied  and  another  is  empty,  divided  by  (^),  namely 


Pr  = 


( M-r  \ 
\N —r+l) 
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therefore  the  average  number  of  probes  for  uniform  probing  is 
M M 

C'N  = rPr  = M + 1 - H(M  + 1 - r)pr 


r—  1 


M 


r= 1 
M 


M + 1 - ]T(M  - AT) 


r=l 


M — r 
M-N-l 

M + 1 — r 
M-N 


= M + 1 - (M  - N) 


M + l — (M  — N)- 


M+l 
M - N+l 
M + l 


M 
N 
M + l 


■M-N+l- M-N+l’  (3=) 

(We  have  already  solved  essentially  the  same  problem  in  connection  with  random 
sampling,  in  exercise  3.4.2-5.)  Setting  a = N/M,  this  exact  formula  for  C'N  is 
approximately  equal  to 

= ! + <*  + <*2  + «3  H , (33) 

a series  that  has  a rough  intuitive  interpretation:  With  probability  a we  need 
more  than  one  probe,  with  probability  a 2 we  need  more  than  two,  etc.  The 
corresponding  average  number  of  probes  for  a successful  search  is 

r - — V'  r'  — _ + 1 ( 1 1 JL 

N N k N Vm  + 1 + M 


fc= 0 


N 

M + l 
N 


+ ••■  + 


M - N + 2 


) 


(Hm+i  ~ Hm-n+ 1)  ~ — In  . 

a 1 — a 


(34) 


As  remarked  above,  extensive  tests  show  that  Algorithm  D with  two  independent 
hash  functions  behaves  essentially  like  uniform  probing,  for  all  practical  purposes. 
In  fact,  double  hashing  is  asymptotically  equivalent  to  uniform  probing,  in  the 
limit  as  M — > 00  (see  exercise  70). 

This  completes  our  analysis  of  uniform  probing.  In  order  to  study  linear 
probing  and  other  types  of  collision  resolution,  we  need  to  set  up  the  theory 
in  a different,  more  realistic  way.  The  probabilistic  model  we  shall  use  for  this 
purpose  assumes  that  each  of  the  MN  possible  “hash  sequences” 


ai  a2  . . . ajv,  0 < aj  < M,  (35) 

is  equally  likely,  where  aj  denotes  the  initial  hash  address  of  the  jth  key  inserted 
into  the  table.  The  average  number  of  probes  in  a successful  search,  given  any 
particular  searching  algorithm,  will  be  denoted  by  Cjv  as  above;  this  is  assumed 
to  be  the  average  number  of  probes  needed  to  find  the  fcth  key,  averaged  over 
1 < k < N with  each  key  equally  likely,  and  averaged  over  all  hash  sequences  (35) 
with  each  sequence  equally  likely.  Similarly,  the  average  number  of  probes  needed 
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when  the  Nth  key  is  inserted,  considering  all  sequences  (35)  to  be  equally  likely, 
will  be  denoted  by  C'N_1;  this  is  the  average  number  of  probes  in  an  unsuccessful 
search  starting  with  N — 1 elements  in  the  table.  When  open  addressing  is  used, 

1 7V“1 

X]  (36) 

fc= 0 

so  that  we  can  deduce  one  quantity  from  the  other  as  we  have  done  in  (34). 

Strictly  speaking,  there  are  two  defects  even  in  this  more  accurate  model.  In 
the  first  place,  the  different  hash  sequences  aren’t  all  equally  probable,  because 
the  keys  themselves  are  distinct.  This  makes  the  probability  that  a\  = a2  slightly 
less  than  1/M;  but  the  difference  is  usually  negligible  since  the  set  of  all  possible 
keys  is  typically  very  large  compared  to  M.  (See  exercise  24.)  Furthermore  a 
good  hash  function  will  exploit  the  nonrandomness  of  typical  data,  making  it 
even  less  likely  that  «i  = a2;  as  a result,  our  estimates  for  the  number  of  probes 
will  be  pessimistic.  Another  inaccuracy  in  the  model  is  indicated  in  Fig.  43: 
Keys  that  occur  earlier  are  (with  some  exceptions)  more  likely  to  be  looked  up 
than  keys  that  occur  later.  Therefore  our  estimate  of  C/v  tends  to  be  doubly 
pessimistic,  and  the  algorithms  should  perform  slightly  better  in  practice  than 
our  analysis  predicts. 

With  these  precautions,  we  are  ready  to  make  an  “exact”  analysis  of  linear 
probing.*  Let  /(M,  N)  be  the  number  of  hash  sequences  (35)  such  that  position  0 
of  the  table  will  be  empty  after  the  keys  have  been  inserted  by  Algorithm  L.  The 
circular  symmetry  of  linear  probing  implies  that  position  0 is  empty  just  as  often 
as  any  other  position,  so  it  is  empty  with  probability  1 — N/M;  in  other  words 

f(M,N)=(l-^MN.  (37) 

By  convention  we  also  set  /( 0,0)  = 1.  Now  let  g(M,N,k)  be  the  number  of 
hash  sequences  (35)  such  that  the  algorithm  leaves  position  0 empty,  positions  1 
through  k occupied,  and  position  k + 1 empty.  We  have 

g(M,N,k)  = ^f(k+l,k)f(M-k-l,N-k),  (38) 

because  all  such  hash  sequences  are  composed  of  two  subsequences,  one  (con- 
taining k elements  a,  < k)  that  leaves  position  0 empty  and  positions  1 through 
k occupied  and  one  (containing  N — k elements  (ij  > k + 1)  that  leaves  po- 
sition k + 1 empty;  there  are  f(k+ 1,  k)  subsequences  of  the  former  type  and 
f(M—k  — 1,  N—k ) of  the  latter  type,  and  there  are  (‘^ ) ways  to  intersperse  two 
such  subsequences.  Finally  let  Pk  be  the  probability  that  exactly  k + 1 probes 
will  be  needed  when  the  (N  + l)st  key  is  inserted;  it  follows  (see  exercise  25) 

* The  author  cannot  resist  inserting  a biographical  note  at  this  point:  I first  formulated 
the  following  derivation  in  1962,  shortly  after  beginning  work  on  The  Art  of  Computer  Pro- 
gramming. Since  this  was  the  first  nontrivial  algorithm  I had  ever  analyzed  satisfactorily,  it 
had  a strong  influence  on  the  structure  of  these  books.  Ever  since  that  day,  the  analysis  of 
algorithms  has  in  fact  been  one  of  the  major  themes  of  my  life. 


6.4 


HASHING 


537 


that 

Pk  = M-N  (, g(M , N,  k)  + g(M,  N,k+ 1)  + ■ • • + g(M,  N,  N)) . (39) 

Now  C'N  = + 1)-Pfei  putting  this  equation  together  with  (36)  -(39)  and 

simplifying  yields  the  following  result. 


Theorem  K.  The  average  number  of  probes  needed  by  Algorithm  L,  assuming 
that  all  Mn  hash  sequences  (35)  are  equally  likely,  is 


where 


C/v  — |(l  + Qo(M,  N—  1))  (successful  search), 
C'N  = |(l  + Qi(M,N ))  (unsuccessful  search), 


+ • 


N N-  1 
M M 


AT-fc+1 

M 


(40) 

(41) 


(42) 


Proof.  Details  of  the  calculation  are  worked  out  in  exercise  27.  (For  the  variance, 
see  exercises  28,  67,  and  68.)  | 

The  rather  strange-looking  function  Qr{M,  N ) that  appears  in  this  theorem 
is  really  not  hard  to  deal  with.  We  have 

Nk  - <N(N-l)...(N-k  + l)<Nk; 

hence  if  N/M  = a, 

'Pi'V)  (w*-  (2)"*“') 

Err)«‘-SE(rr)(")^so,(M.„M,<i: 

k>  0 fc>0  fc>0 

that  is, 

(1  -a)r+!  “ m(  2 ) (1  - a)r+3  “ <3r^M’  aM^  - (1  - a)r+! ' ^ 

This  relation  gives  us  a good  estimate  of  Qr(M,N)  when  M is  large  and  a is 
not  too  close  to  1.  (The  lower  bound  is  a better  approximation  than  the  upper 
bound.)  When  a approaches  1,  these  formulas  become  useless,  but  fortunately 
Qo(M,  M — 1)  is  the  function  Q(M)  whose  asymptotic  behavior  was  studied  in 
great  detail  in  Section  1.2.11.3;  and  Q\{M,  M—  1)  is  simply  equal  to  M (see 
exercise  50).  In  terms  of  the  standard  notation  for  hypergeometric  functions, 
Eq.  1.2.6-(39),  we  have  Qr(M,N)  = F(r+ 1,  -TV; ; -1/M)  = F(r+\N -1  | -i). 
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Another  approach  to  the  analysis  of  linear  probing  was  taken  in  the  early 
days  by  G.  Schay,  Jr.  and  W.  G.  Spruth  [CACM  5 (1962),  459-462].  Although 
their  method  yielded  only  an  approximation  to  the  exact  formulas  in  Theorem 
K,  it  sheds  further  light  on  the  algorithm,  so  we  shall  sketch  it  briefly  here.  First 
let  us  consider  a surprising  property  of  linear  probing  that  was  first  noticed  by 
W.  W.  Peterson  in  1957: 

Theorem  P.  The  average  number  of  probes  in  a successful  search  by  Algo- 
rithm L is  independent  of  the  order  in  which  the  keys  were  inserted;  it  depends 
only  on  the  number  of  keys  that  hash  to  each  address. 

In  other  words,  any  rearrangement  of  a hash  sequence  a\  a2  . . . yields 
a hash  sequence  with  the  same  average  displacement  of  keys  from  their  hash 
addresses.  (We  are  assuming,  as  stated  earlier,  that  all  keys  in  the  table  have 
equal  importance.  If  some  keys  are  more  frequently  accessed  than  others,  the 
proof  can  be  extended  to  show  that  an  optimal  arrangement  occurs  if  we  insert 
them  in  decreasing  order  of  frequency,  using  the  method  of  Theorem  6. IS.) 

Proof.  It  suffices  to  show  that  the  total  number  of  probes  needed  to  insert  keys 
for  the  hash  sequence  ai  a2  . . . is  the  same  as  the  total  number  needed  for 
a\ . . . a,_i  at+i  a,  al+2  • • • ajv,  1 < i < N.  There  is  clearly  no  difference  unless  the 
(■ i + l)st  key  in  the  second  sequence  falls  into  the  position  occupied  by  the  ith 
in  the  first  sequence.  But  then  the  ith  and  (i  + l)st  merely  exchange  places,  so 
the  number  of  probes  for  the  (i  + l)st  is  decreased  by  the  same  amount  that  the 
number  for  the  ith  is  increased.  | 

Theorem  P tells  us  that  the  average  search  length  for  a hash  sequence 
ai  <22  • • • ajv  can  be  determined  from  the  numbers  bo  fq. . . 5m- l,  where  bj  is  the 
number  of  a’s  that  equal  j.  From  this  sequence  we  can  determine  the  “carry 
sequence”  Co  Cj . . . cm-i,  where  Cj  is  the  number  of  keys  for  which  both  locations 
j and  j — 1 are  probed  as  the  key  is  inserted.  This  sequence  is  determined  by 
the  rule 

j 0)  if  C(j_|_i)  mod  M d, 

3 1 bj  + C(j+i)  mod  m - 1,  otherwise. 

For  example,  let  M = 10,  N = 8,  and  bo  . . .b9  = 0320  10000  2;  then 
Co- ..eg  = 2310000123,  since  one  key  needs  to  be  “carried  over”  from 
position  2 to  position  1,  three  from  position  1 to  position  0,  two  of  these  from 

position  0 to  position  9,  etc.  We  have  b0  + 4 1-  t>M-i  = A,  and  the  average 

number  of  probes  needed  for  retrieval  of  the  N keys  is 

1 + (co  + ci  H \-cm-i)/N.  (45) 

Rule  (44)  seems  to  be  a circular  definition  of  the  c’s  in  terms  of  themselves,  but 
actually  there  is  a unique  solution  to  the  stated  equations  whenever  N < M (see 
exercise  32). 

Schay  and  Spruth  used  this  idea  to  determine  the  probability  that  Cj  = k, 
in  terms  of  the  probability  pk  that  bj  = k.  (These  probabilities  are  independent 
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of  j.)  Thus 

<7o  = Poqo  + piqo  + poqi, 

qi  = P2qo  + Piqi  + poq2,  (46) 

<72  = P3<70  + P2<7l  + Pi  <72  + PO<73, 

etc.,  since,  for  example,  the  probability  that  cj  = 2 is  the  probability  that 
bj  + C(J+1)  mod  M = 3.  Let  B(z)  = J2PkZk  and  C(z)  = qkZk  be  the  generating 
functions  for  these  probability  distributions;  the  equations  (46)  are  equivalent  to 

B(z)C(z)  = poqo  + ( <7o  -Poqo)z  + q\z2  H = p0q0{l  - z)  + zC(z). 


Since  B(  1)  = 1,  we  may  write  B(z)  — 1 + (z  — l)D(z),  and  it  follows  that 


C{z) 


Poqo 
1 ~D(z) 


1 - m 

1 -D(z)' 


(47) 


since  C(l)  = 1.  The  average  number  of  probes  needed  for  retrieval,  according 
to  (45),  will  therefore  be 


1 I MC’ m -1  I M D>{1) 

+ N°  ( ) + N l — D(l) 


= 1 + 


M B"(  1) 
2N1-B'{1\ 


(48) 


Since  we  are  assuming  that  each  hash  sequence  a\...an  is  equally  likely,  we 
have 


hence 


Pk  = Pr (exactly  k of  the  a,  are  equal  to  j,  for  fixed  j) 

-ow-in 


B"(  1) 


N(N  — 1) 


and  the  average  number  of  probes  according  to  (48)  will  be 


M-1  \ 
M - TV/’ 


(49) 


(50) 


(51) 


Can  the  reader  spot  the  incorrect  reasoning  that  has  caused  this  answer  to  be 
different  from  the  correct  result  in  Theorem  K?  (See  exercise  33.) 


*Optimality  considerations.  We  have  seen  several  examples  of  probe  sequences 
for  open  addressing,  and  it  is  natural  to  ask  for  one  that  can  be  proved  best 
possible  in  some  meaningful  sense.  This  problem  has  been  set  up  in  the  follow- 
ing interesting  way  by  J.  D.  Ullman  [JACM  19  (1972),  569-575]:  Instead  of 
computing  a hash  address  h(K),  we  map  each  key  K into  an  entire  permutation 
of  {0,  1, ... , M— 1},  which  represents  the  probe  sequence  to  use  for  K.  Each  of 
the  M!  permutations  is  assigned  a probability,  and  the  generalized  hash  function 
is  supposed  to  select  each  permutation  with  that  probability.  The  question  is, 
“What  assignment  of  probabilities  to  permutations  gives  the  best  performance, 
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in  the  sense  that  the  corresponding  average  number  of  probes  CN  or  C'N  is 
minimized?” 

For  example,  if  we  assign  the  probability  1/M!  to  each  permutation,  it  is 
easy  to  see  that  we  have  exactly  the  behavior  of  uniform  probing  that  we  have 
analyzed  above  in  (32)  and  (34).  However,  Ullman  found  an  example  with  M = 4 
and  N — 2 for  which  C'N  is  smaller  than  the  value  | obtained  with  uniform 
probing.  His  construction  assigns  zero  probability  to  all  but  the  following  six 
permutations: 


Permutation 

0 12  3 

2 0 13 

3 0 12 


Probability 
(l  + 2e)/6 
(l-e)/6 
(1  - e)/6 


Permutation 

10  3 2 

2 10  3 

3 10  2 


Probability 

(1  + 2e)/6 
(l-e)/6 
(1  - e)/6 


(52) 


Roughly  speaking,  the  first  probe  tends  to  be  either  2 or  3,  but  the  second  probe 
is  always  0 or  1.  The  average  number  of  probes  needed  to  insert  the  third  item, 
C2,  turns  out  to  be  | + 0(e2),  so  we  can  improve  on  uniform  probing  by 

taking  e to  be  a small  positive  value. 

However,  the  corresponding  value  of  C[  for  these  probabilities  is  + O(f), 
which  is  larger  than  | (the  uniform  probing  value).  Ullman  proved  that  any 
assignment  of  probabilities  such  that  C'N  < (M  + 1 )/(M  + 1 — N)  for  some  N 
always  implies  that  C'n  > (M  + 1 )/(M  + 1 — n)  for  some  n < N;  you  can’t  win 
all  the  time  over  uniform  probing. 

Actually  the  number  of  probes  Cn  for  a successful  search  is  a better  measure 
than  C'N-  The  permutations  in  (52)  do  not  lead  to  an  improved  value  of  Cat  for 
any  N,  and  indeed  Ullman  conjectured  that  no  assignment  of  probabilities  will 
be  able  to  make  Cn  less  than  the  uniform  value  ((M+1)/N)(Hm+i—Hm+i-n)- 
Andrew  Yao  proved  an  asymptotic  form  of  this  conjecture  by  showing  that  the 
limiting  cost  when  N = aM  and  M — »•  00  is  always  > ^ In  [JACM  32 
(1985),  687-693], 

The  strong  form  of  Ullman’s  conjecture  appears  to  be  very  difficult  to  prove, 
especially  because  there  are  many  ways  to  assign  probabilities  to  achieve  the 
effect  of  uniform  probing;  we  do  not  need  to  assign  1/M!  to  each  permutation. 


For  example,  the  following  assignment  for  M = 4 is 

equivalent  to 

uniform 

probing: 

Permutation 

Probability  Permutation 

Probability 

0 12  3 

1/6  0 2 1 3 

1/12 

12  3 0 

1/6  13  20 

1/12 

(53) 

2 3 0 1 

1/6  2 0 3 1 

1/12 

3 0 12 

1/6  3 102 

1/12 

with  zero  probability  assigned  to  the  other  16  permutations. 

The  following  theorem  characterizes  all  assignments  that  produce  the  be- 
havior of  uniform  probing. 


Theorem  U.  An  assignment  of  probabilities  to  permutations  will  make  each 
of  the  (^)  configurations  of  empty  and  occupied  cells  equally  likely  after  N 
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insertions,  for  0 < N < M,  if  and  only  if  the  sum  of  probabilities  assigned  to  all 
permutations  whose  first  N elements  are  the  members  of  a given  N -element  set 
is  1/(^5) , for  all  N and  for  all  N -element  sets. 

For  example,  the  sum  of  probabilities  assigned  to  each  of  the  3!  (M  — 3)!  per- 
mutations beginning  with  the  numbers  {0, 1,2}  in  some  order  must  be  1/ (^)  = 
3!(M  — 3)!/M!.  Observe  that  the  condition  of  this  theorem  holds  in  (53),  because 
1/6  + 1/12  = 1/4. 

Proof.  Let  AC  (0,  1, ... , M—  1},  and  let  n(A)  be  the  set  of  all  permutations 
whose  first  | A elements  are  members  of  A;  also  let  5(A)  be  the  sum  of  the 
probabilities  assigned  to  those  permutations.  Let  Pk{A)  be  the  probability  that 
the  first  |A|  insertions  of  the  open  addressing  procedure  occupy  the  locations 
specified  by  A,  and  that  the  last  insertion  required  exactly  k probes.  Finally,  let 
P(A)  = Pi  (A)  + P2(A)  + • • • . The  proof  is  by  induction  on  N > 1,  assuming 
that 

P(A)  = S(A)  = >/(") 

for  all  sets  A with  |A|  = n < N.  Let  B be  any  TV-element  set.  Then 
Pk(B)=  £ £ Pr(ir)P{B\{nk}), 

ACB  7r€lI(A) 

\A\=k 


where  Pr(7r)  is  the  probability  assigned  to  permutation  7r  and  irk  is  its  fcth 
element.  By  induction 

pk(B)=  -4-  PrM’ 

ACB  \N-lJ  7ren(A) 

|A|=fc 


which  equals 


hence 


1 / JV_1  ( N ) \ 
m=7¥T  SW+Etb}  . 

fjv-l/  \ k= 1 \ k ) ' 

and  this  can  be  equal  to  l/ (^)  if  and  only  if  S(B)  has  the  correct  value. 


External  searching.  Hashing  techniques  lend  themselves  well  to  external 
searching  on  direct-access  storage  devices  like  disks  or  drums.  For  such  ap- 
plications, as  in  Section  6.2.4,  we  want  to  minimize  the  number  of  accesses  to 
the  file,  and  this  has  two  major  effects  on  the  choice  of  algorithms: 

1)  It  is  reasonable  to  spend  more  time  computing  the  hash  function,  since  the 
penalty  for  bad  hashing  is  much  greater  than  the  cost  of  the  extra  time 
needed  to  do  a careful  job. 

2)  The  records  are  usually  grouped  into  pages  or  buckets,  so  that  several  records 
are  fetched  from  the  external  memory  each  time. 
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The  file  is  divided  into  M buckets  containing  b records  each.  Collisions  now 
cause  no  problem  unless  more  than  b keys  have  the  same  hash  address.  The 
following  three  approaches  to  collision  resolution  seem  to  be  best: 

A)  Chaining  with  separate  lists.  If  more  than  b records  fall  into  the  same  bucket, 
a link  to  an  overflow  record  can  be  inserted  at  the  end  of  the  first  bucket.  These 
overflow  records  are  kept  in  a special  overflow  area.  There  is  usually  no  advantage 
in  having  buckets  in  the  overflow  area,  since  comparatively  few  overflows  occur; 
thus,  the  extra  records  are  usually  linked  together  so  that  the  (6  + k) th  record 
of  a list  requires  1 + k accesses.  It  is  usually  a good  idea  to  leave  some  room  for 
overflows  on  each  cylinder  of  a disk  file,  so  that  most  accesses  are  to  the  same 
cylinder. 

Although  this  method  of  handling  overflows  seems  inefficient,  the  number  of 
overflows  is  statistically  small  enough  that  the  average  search  time  is  very  good. 
See  Tables  2 and  3,  which  show  the  average  number  of  accesses  required  as  a 
function  of  the  load  factor 

a = N/Mb,  (54) 

for  fixed  a as  M,  N oo.  Curiously  when  a = 1 the  asymptotic  number  of 
accesses  for  an  unsuccessful  search  increases  with  increasing  b. 


Table  2 

AVERAGE  ACCESSES  IN  AN  UNSUCCESSFUL  SEARCH  BY  SEPARATE  CHAINING 


Bucket 
size,  b 

10% 

20% 

30% 

Load  factor,  a 
40%  50%  60% 

70% 

80% 

90% 

95% 

1 

1.0048 

1.0187 

1.0408 

1.0703 

1.1065 

1.1488 

1.197 

1.249 

1.307 

1.34 

2 

1.0012 

1.0088 

1.0269 

1.0581 

1.1036 

1.1638 

1.238 

1.327 

1.428 

1.48 

3 

1.0003 

1.0038 

1.0162 

1.0433 

1.0898 

1.1588 

1.252 

1.369 

1.509 

1.59 

4 

1.0001 

1.0016 

1.0095 

1.0314 

1.0751 

1.1476 

1.253 

1.394 

1.571 

1.67 

5 

1.0000 

1.0007 

1.0056 

1.0225 

1.0619 

1.1346 

1.249 

1.410 

1.620 

1.74 

10 

1.0000 

1.0000 

1.0004 

1.0041 

1.0222 

1.0773 

1.201 

1.426 

1.773 

2.00 

20 

1.0000 

1.0000 

1.0000 

1.0001 

1.0028 

1.0234 

1.113 

1.367 

1.898 

2.29 

50 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0007 

1.018 

1.182 

1.920 

2.70 

AVERAGE  ACCESSES 

Table  3 

IN  A SUCCESSFUL  SEARCH  BY 

SEPARATE 

CHAINING 

Bucket 
size,  b 

10% 

20% 

30% 

40% 

Load  factor,  a 
50%  60% 

70% 

80% 

90% 

95% 

1 

1.0500 

1.1000 

1.1500 

1.2000 

1.2500 

1.3000 

1.350 

1.400 

1.450 

1.48 

2 

1.0063 

1.0242 

1.0520 

1.0883 

1.1321 

1.1823 

1.238 

1.299 

1.364 

1.40 

3 

1.0010 

1.0071 

1.0215 

1.0458 

1.0806 

1.1259 

1.181 

1.246 

1.319 

1.36 

4 

1.0002 

1.0023 

1.0097 

1.0257 

1.0527 

1.0922 

1.145 

1.211 

1.290 

1.33 

5 

1.0000 

1.0008 

1.0046 

1.0151 

1.0358 

1.0699 

1.119 

1.186 

1.268 

1.32 

10 

1.0000 

1.0000 

1.0002 

1.0015 

1.0070 

1.0226 

1.056 

1.115 

1.206 

1.27 

20 

1.0000 

1.0000 

1.0000 

1.0000 

1.0005 

1.0038 

1.018 

1.059 

1.150 

1.22 

50 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.001 

1.015 

1.083 

1.16 
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B)  Chaining  with  coalescing  lists.  Instead  of  providing  a separate  overflow  area, 
we  can  adapt  Algorithm  C to  external  files.  A doubly  linked  list  of  available 
space  can  be  maintained  for  each  cylinder,  linking  together  each  bucket  that  is 
not  yet  full.  Under  this  scheme,  every  bucket  contains  a count  of  how  many 
record  positions  are  empty,  and  the  bucket  is  removed  from  the  doubly  linked 
list  only  when  its  count  becomes  zero.  A “roving  pointer”  can  be  used  to 
distribute  overflows  (see  exercise  2.5-6),  so  that  different  chains  tend  to  use 
different  overflow  buckets.  This  method  has  not  yet  been  analyzed,  but  it  might 
prove  to  be  quite  useful. 

C)  Open  addressing.  We  can  also  do  without  links,  using  an  “open”  method. 
Linear  probing  is  probably  better  than  random  probing  when  we  consider  exter- 
nal searching,  because  the  increment  c can  often  be  chosen  so  that  it  minimizes 
latency  delays  between  consecutive  accesses.  The  approximate  theoretical  model 
of  linear  probing  that  was  worked  out  above  can  be  generalized  to  account  for 
the  influence  of  buckets,  and  it  shows  that  linear  probing  is  indeed  satisfactory 
unless  the  table  has  gotten  very  full.  For  example,  see  Table  4;  when  the  load 
factor  is  90  percent  and  the  bucket  size  is  50,  the  average  number  of  accesses  in 
a successful  search  is  only  1.04.  This  is  actually  better  than  the  1.08  accesses 
required  by  the  chaining  method  (A)  with  the  same  bucket  size! 


Table  4 

AVERAGE  ACCESSES  IN  A SUCCESSFUL  SEARCH  BY  LINEAR  PROBING 


Bucket 
size,  b 

10% 

20% 

30% 

Load  factor,  a 
40%  50%  60% 

70% 

80% 

90% 

95% 

1 

1.0556 

1.1250 

1.2143 

1.3333 

1.5000 

1.7500 

2.167 

3.000 

5.500 

10.50 

2 

1.0062 

1.0242 

1.0553 

1.1033 

1.1767 

1.2930 

1.494 

1.903 

3.147 

5.64 

3 

1.0009 

1.0066 

1.0201 

1.0450 

1.0872 

1.1584 

1.286 

1.554 

2.378 

4.04 

4 

1.0001 

1.0021 

1.0085 

1.0227 

1.0497 

1.0984 

1.190 

1.386 

2.000 

3.24 

5 

1.0000 

1.0007 

1.0039 

1.0124 

1.0307 

1.0661 

1.136 

1.289 

1.777 

2.77 

10 

1.0000 

1.0000 

1.0001 

1.0011 

1.0047 

1.0154 

1.042 

1.110 

1.345 

1.84 

20 

1.0000 

1.0000 

1.0000 

1.0000 

1.0003 

1.0020 

1.010 

1.036 

1.144 

1.39 

50 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.001 

1.005 

1.040 

1.13 

The  analysis  of  methods  (A)  and  (C)  involves  some  very  interesting  mathe- 
matics; we  shall  merely  summarize  the  results  here,  since  the  details  are  worked 
out  in  exercises  49  and  55.  The  formulas  involve  two  functions  strongly  related 
to  the  Q-functions  of  Theorem  K,  namely 


R(a,  n) 


o *4  0 

n n a n or 

n + 1 (n  + l)(n  + 2)  (n  + l)(n  + 2)(n  + 3) 


and 


Ini.®) 


( (an)n  . o (cm)n+1  (an)n+2 

V(n+1)!  (n  + 2)!  (n  + 3)! 


(55) 


— a)R(a,  n)). 


(56) 
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In  terms  of  these  functions,  the  average  number  of  accesses  made  by  the  chaining 
method  (A)  in  an  unsuccessful  search  is 

C'N  = l + abtb(a)  + o(^  (57) 

as  M,  N — > 00,  and  the  corresponding  number  in  a successful  search  is 

Cn  = 1-+  (2  + (a-l)6+(a2  + (a-l)2(6-l))J2(a,6))+0^— J.  (58) 

The  limiting  values  of  these  formulas  are  the  quantities  shown  in  Tables  2 and  3. 

Since  chaining  method  (A)  requires  a separate  overflow  area,  we  need  to 
estimate  how  many  overflows  will  occur.  The  average  number  of  overflows  will 
be  M(C'n  — 1)  = Ntb(a),  since  C'N  — 1 is  the  average  number  of  overflows  in  any 
given  list.  Therefore  Table  2 can  be  used  to  deduce  the  amount  of  overflow  space 
required.  For  fixed  a,  the  standard  deviation  of  the  total  number  of  overflows 
will  be  roughly  proportional  to  \/M  as  M — > 00. 

Asymptotic  values  for  C'N  and  Cn  appear  in  exercise  53,  but  the  approxi- 
mations aren’t  very  good  when  b is  small  or  a is  large;  fortunately  the  series  for 
R(a,n)  converges  rather  rapidly  even  when  a is  large,  so  the  formulas  can  be 
evaluated  to  any  desired  precision  without  much  difficulty.  The  maximum  values 
occur  for  a = 1,  when 

e-bhb+i  nr 

ma xC;  = 1 + — = V ^ + 1 + °(b  1/2)’  (59) 

p-bub  r / q 

max  Cjv  = 1 + + 1)  = i + V^fe  + (6o) 

as  b -A  00,  by  Stirling’s  approximation  and  the  analysis  of  the  function  R(n)  = 
U(l,n)  — 1 in  Section  1.2.11.3. 

The  average  number  of  accesses  in  a successful  external  search  with  linear 
probing  has  the  remarkably  simple  expression 

Cn  ~ 1 + tb(a)  + t2b(oi)  + tsb{ot)  + • • • , (61) 

which  can  be  understood  as  follows:  The  average  total  number  of  accesses  to 
look  up  all  N keys  is  NCn , and  this  is  TV  + Tj  + T2  H — • , where  Tk  is  the  average 
number  of  keys  that  require  more  than  k accesses.  Theorem  P says  that  we  can 
enter  the  keys  in  any  order  without  affecting  Cn,  and  it  follows  that  Tk  is  the 
average  number  of  overflow  records  that  would  occur  in  the  chaining  method  if 
we  had  M/k  buckets  of  size  kb,  namely  Ntkb(a)  by  what  we  said  above.  Further 
justification  of  Eq.  (61)  appears  in  exercise  55. 

An  excellent  early  discussion  of  practical  considerations  involved  in  the  de- 
sign of  external  hash  tables  was  given  by  Charles  A.  Olson,  Proc.  ACM  Nat.  Conf. 
24  (1969),  539-549.  He  included  several  worked  examples  and  pointed  out  that 
the  number  of  overflow  records  will  increase  substantially  if  the  file  is  subject  to 
frequent  insertion/deletion  activity  without  relocating  records.  He  also  presented 
an  analysis  of  this  situation  that  was  obtained  jointly  with  J.  A.  de  Peyster. 
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L = Linear  probing  = Algorithm  L 
U2  = Random  probing  with  secondary 
U = Uniform  hashing  ss  Algorithm  I 
— B = Brent’s  variation  of  Algorithm  I 
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Fig.  44.  Comparison  of  collision  resolution  methods:  limiting  values  of  the  average 
number  of  probes  as  M -+  oo. 


Comparison  of  the  methods.  We  have  now  studied  a large  number  of 
techniques  for  searching;  how  can  we  select  the  right  one  for  a given  application? 
It  is  difficult  to  summarize  in  a few  words  all  the  relevant  details  of  the  trade-offs 
involved  in  the  choice  of  a search  method,  but  the  following  things  seem  to  be 
of  primary  importance  with  respect  to  the  speed  of  searching  and  the  requisite 
storage  space. 

Figure  44  summarizes  the  analyses  of  this  section,  showing  that  the  various 
methods  for  collision  resolution  lead  to  different  numbers  of  probes.  But  probe 
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counting  does  not  tell  the  whole  story,  since  the  time  per  probe  varies  in  different 
methods,  and  the  latter  variation  has  a noticeable  effect  on  the  running  time  (as 
we  have  seen  in  Fig.  42).  Linear  probing  accesses  the  table  more  frequently 
than  the  other  methods  shown  in  Fig.  44,  but  it  has  the  advantage  of  simplicity. 
Furthermore,  even  linear  probing  isn’t  terribly  bad:  When  the  table  is  90  percent 
full,  Algorithm  L requires  fewer  than  5.5  probes,  on  the  average,  to  locate  a 
random  item  in  the  table.  (However,  a 90-percent-full  table  does  require  about 
50.5  probes  for  every  new  item  inserted  by  Algorithm  L.) 

Figure  44  shows  that  the  chaining  methods  are  quite  economical  with  re- 
spect to  the  number  of  probes,  but  the  extra  memory  space  needed  for  link 
fields  sometimes  makes  open  addressing  more  attractive  for  small  records.  For 
example,  if  we  have  to  choose  between  a chained  hash  table  of  capacity  500  and 
an  open  hash  table  of  capacity  1000,  the  latter  is  clearly  preferable,  since  it  allows 
efficient  searching  when  500  records  are  present  and  it  is  capable  of  absorbing 
twice  as  much  data.  On  the  other  hand,  sometimes  the  record  size  and  format 
will  allow  space  for  link  fields  at  virtually  no  extra  cost.  (See  exercise  65.) 

How  do  hash  methods  compare  with  the  other  search  strategies  we  have 
studied  in  this  chapter?  From  the  standpoint  of  speed  we  can  argue  that  they 
are  better,  when  the  number  of  records  is  large,  because  the  average  search  time 
for  a hash  method  stays  bounded  as  N -A  oo  if  we  stipulate  that  the  table  never 
gets  too  full.  For  example,  Program  L will  take  only  about  55  units  of  time  for 
a successful  search  when  the  table  is  90  percent  full;  this  beats  the  fastest  MIX 
binary  search  routine  we  have  seen  (exercise  6.2.1-24)  when  N is  greater  than  600 
or  so,  at  the  cost  of  only  11  percent  in  storage  space.  Moreover  the  binary  search 
is  suitable  only  for  fixed  tables,  while  a hash  table  allows  efficient  insertions. 

We  can  also  compare  Program  L to  the  tree-oriented  search  methods  that 
allow  dynamic  insertions.  Program  L with  a 90-percent-full  table  is  faster  than 
Program  6.2.2T  when  N is  greater  than  about  90,  and  faster  than  Program  6. 3D 
(exercise  6.3-9)  when  N is  greater  than  about  75. 

Only  one  search  method  in  this  chapter  is  efficient  for  successful  searching 
with  virtually  no  storage  overhead,  namely  Brent’s  variation  of  Algorithm  D. 
His  method  allows  us  to  put  N records  into  a table  of  size  M — N + 1,  and 
to  find  any  record  in  about  2.5  probes  on  the  average.  No  extra  space  for  link 
fields  or  tag  bits  is  needed;  however,  an  unsuccessful  search  will  be  very  slow, 
requiring  about  N/2  probes. 

Thus  hashing  has  several  advantages.  On  the  other  hand,  there  are  three 
important  respects  in  which  hash  table  searching  is  inferior  to  other  methods: 

a)  After  an  unsuccessful  search  in  a hash  table,  we  know  only  that  the 
desired  key  is  not  present.  Search  methods  based  on  comparisons  always  yield 
more  information;  they  allow  us  to  find  the  largest  key  < K and/or  the  smallest 
key  > K.  This  is  important  in  many  applications;  for  example,  it  allows  us  to 
interpolate  function  values  from  a stored  table.  We  can  also  use  comparison- 
based  algorithms  to  locate  all  keys  that  lie  between  two  given  values  K and  K'. 
Furthermore  the  tree  search  algorithms  of  Section  6.2  make  it  easy  to  traverse 
the  contents  of  a table  in  ascending  order,  without  sorting  it  separately. 
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b)  The  storage  allocation  for  hash  tables  is  often  somewhat  difficult;  we 
have  to  dedicate  a certain  area  of  the  memory  for  use  as  the  hash  table,  and 
it  may  not  be  obvious  how  much  space  should  be  allotted.  If  we  provide  too 
much  memory,  we  may  be  wasting  storage  at  the  expense  of  other  lists  or  other 
computer  users;  but  if  we  don’t  provide  enough  room,  the  table  will  overflow. 
By  contrast,  the  tree  search  and  insertion  algorithms  deal  with  trees  that  grow 
no  larger  than  necessary.  In  a virtual  memory  environment  we  can  keep  memory 
accesses  localized  if  we  use  tree  search  or  digital  tree  search,  instead  of  creating  a 
large  hash  table  that  requires  the  operating  system  to  access  a new  page  nearly 
every  time  we  hash  a key. 

c)  Finally,  we  need  a great  deal  of  faith  in  probability  theory  when  we  use 
hashing  methods,  since  they  are  efficient  only  on  the  average,  while  their  worst 
case  is  terrible!  As  in  the  case  of  random  number  generators,  we  can  never  be 
completely  sure  that  a hash  function  will  perform  properly  when  it  is  applied 
to  a new  set  of  data.  Therefore  hash  tables  are  inappropriate  for  certain  real- 
time applications  such  as  air  traffic  control,  where  people’s  lives  are  at  stake;  the 
balanced  tree  algorithms  of  Sections  6.2.3  and  6.2.4  are  much  safer,  since  they 
provide  guaranteed  upper  bounds  on  the  search  time. 

History.  The  idea  of  hashing  appears  to  have  been  originated  by  H.  P.  Luhn, 
who  wrote  an  internal  IBM  memorandum  in  January  1953  that  suggested  the 
use  of  chaining;  in  fact,  his  suggestion  was  one  of  the  first  applications  of  linked 
linear  lists.  He  pointed  out  the  desirability  of  using  buckets  that  contain  more 
than  one  element,  for  external  searching.  Shortly  afterwards,  A.  D.  Lin  carried 
Luhn’s  analysis  further,  and  suggested  a technique  for  handling  overflows  that 
used  “degenerative  addresses”;  for  example,  the  overflows  from  primary  bucket 
2748  were  put  in  secondary  bucket  274;  overflows  from  that  bucket  went  to 
tertiary  bucket  27,  and  so  on,  assuming  the  presence  of  10000  primary  buckets, 
1000  secondary  buckets,  100  tertiary  buckets,  etc.  The  hash  functions  originally 
suggested  by  Luhn  were  digital  in  nature;  for  example,  he  combined  adjacent 
pairs  of  key  digits  by  adding  them  mod  10,  so  that  31415926  would  be  compressed 
to  4548. 

At  about  the  same  time  the  idea  of  hashing  occurred  independently  to 
another  group  of  IBMers:  Gene  M.  Amdahl,  Elaine  M.  Boehm,  N.  Rochester, 
and  Arthur  L.  Samuel,  who  were  building  an  assembly  program  for  the  IBM  701. 
In  order  to  handle  the  collision  problem,  Amdahl  originated  the  idea  of  open 
addressing  with  linear  probing.  [See  also  Derr  and  Luke,  JACM  3 (1956),  303.] 

Hash  coding  was  first  described  in  the  open  literature  by  Arnold  I.  Dumey, 
Computers  and  Automation  5,12  (December  1956),  6-9.  He  was  the  first  to 
mention  the  idea  of  dividing  by  a prime  number  and  using  the  remainder  as 
the  hash  address.  Dumey’s  interesting  article  mentions  chaining  but  not  open 
addressing.  A.  P.  Ershov  of  Russia  independently  discovered  linear  open  ad- 
dressing in  1957  [ Doklady  Akad.  Nauk  SSSR  118  (1958),  427-430];  he  published 
empirical  results  about  the  number  of  probes,  conjecturing  correctly  that  the 
average  number  of  probes  per  successful  search  is  < 2 when  N/M  < 2/3. 
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A classic  article  by  W.  W.  Peterson,  IBM  J.  Research  <fc  Development  1 
(1957),  130-146,  was  the  first  major  paper  dealing  with  the  problem  of  search- 
ing in  large  files.  Peterson  defined  open  addressing  in  general,  analyzed  the 
performance  of  uniform  probing,  and  gave  numerous  empirical  statistics  about 
the  behavior  of  linear  open  addressing  with  various  bucket  sizes,  noting  the 
degradation  in  performance  that  occurred  when  items  were  deleted.  Another 
comprehensive  survey  of  the  subject  was  published  six  years  later  by  Werner 
Buchholz  [IBM  Systems  J.  2 (1963),  86-111],  who  gave  an  especially  good 
discussion  of  hash  functions.  Correct  analyses  of  Algorithm  L were  first  pub- 
lished by  A.  G.  Konheim  and  B.  Weiss,  SIAM  J.  Appl.  Math.  14  (1966),  1266- 
1274;  V.  Podderjugin,  Wissenschaftliche  Zeitschrift  der  Technischen  Universitat 
Dresden  17  (1968),  1087-1089. 

Up  to  this  time  linear  probing  was  the  only  type  of  open  addressing  scheme 
that  had  appeared  in  the  literature,  but  another  scheme  based  on  repeated  ran- 
dom probing  by  independent  hash  functions  had  independently  been  developed 
by  several  people  (see  exercise  48).  During  the  next  few  years  hashing  became 
very  widely  used,  but  hardly  anything  more  was  published  about  it.  Then  Robert 
Morris  wrote  a very  influential  survey  of  the  subject  [CACM  11  (1968),  38-44], 
in  which  he  introduced  the  idea  of  random  probing  with  secondary  clustering. 
Morris’s  paper  touched  off  a flurry  of  activity  that  culminated  in  Algorithm  D 
and  its  refinements. 

It  is  interesting  to  note  that  the  word  “hashing”  apparently  never  appeared 
in  print,  with  its  present  meaning,  until  the  late  1960s,  although  it  had  already 
become  common  jargon  in  several  parts  of  the  world  by  that  time.  The  first 
published  appearance  of  the  word  seems  to  have  been  in  H.  Hellerman’s  book 
Digital  Computer  System  Principles  (New  York:  McGraw-Hill,  1967),  152;  the 
only  previous  occurrence  among  approximately  60  relevant  documents  studied  by 
the  author  as  this  section  was  being  written  was  in  an  unpublished  memorandum 
written  by  W.  W.  Peterson  in  1961.  Somehow  the  verb  “to  hash”  magically 
became  standard  terminology  for  key  transformation  during  the  mid-1960s,  yet 
nobody  was  rash  enough  to  use  such  an  undignified  word  in  print  until  1967! 

Later  developments.  Many  advances  in  the  theory  and  practice  of  hashing 
have  been  made  since  the  author  first  prepared  this  chapter  in  1972,  although 
the  basic  ideas  discussed  above  still  remain  useful  for  ordinary  applications.  For 
example,  the  book  Design  and  Analysis  of  Coalesced  Hashing  by  J.  S.  Vitter 
and  W.-C.  Chen  (New  York:  Oxford  Univ.  Press,  1987)  discusses  and  analyzes 
several  instructive  variants  of  Algorithm  C. 

From  a practical  standpoint,  the  most  important  hash  technique  invented  in 
the  late  1970s  is  probably  the  method  that  Witold  Litwin  called  linear  hashing 
[Proc.  6th  International  Conf.  on  Very  Large  Databases  (1980),  212-223],  Linear 
hashing  — which  incidentally  has  nothing  to  do  with  the  classical  technique  of 
linear  probing  — allows  the  number  of  hash  addresses  to  grow  and/or  contract 
gracefully  as  items  are  inserted  and/or  deleted.  An  excellent  discussion  of  linear 
hashing,  including  comparisons  with  other  methods  for  internal  searching,  has 
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been  given  by  Per-Ake  Larson  in  CACM  31  (1988),  446-457;  see  also  W.  G. 
Griswold  and  G.  M.  Townsend,  Software  Practice  & Exp.  23  (1993),  351-367, 
for  improvements  when  many  large  and/or  small  tables  are  present  simultane- 
ously. Linear  hashing  can  also  be  used  for  huge  databases  that  are  distributed 
between  many  different  sites  on  a network  [see  Litwin,  Neimat,  and  Schneider, 
ACM  Trans.  Database  Syst.  21  (1996),  480-525].  An  alternative  scheme  called 
extendible  hashing,  which  has  the  property  that  at  most  two  references  to  external 
pages  are  needed  to  retrieve  any  record,  was  proposed  at  about  the  same  time  by 
R.  Fagin,  J.  Nievergelt,  N.  Pippenger,  and  H.  R.  Strong  [ ACM  Trans.  Database 
Syst.  4 (1979),  315-344];  related  ideas  had  been  explored  by  G.  D.  Knott,  Proc. 
ACM-SIGFIDET  Workshop  on  Data  Description,  Access  and  Control  (1971), 
187-206.  Both  linear  hashing  and  extendible  hashing  are  preferable  to  the  B- 
trees  of  Section  6.2.4,  when  the  order  of  keys  is  unimportant. 

In  the  theoretical  realm,  more  complicated  methods  have  been  devised  by 
which  it  is  possible  to  guarantee  0(1)  maximum  time  per  access,  with  0(1) 
average  amortized  time  per  insertion  and  deletion,  regardless  of  the  keys  being 
examined;  moreover,  the  total  storage  used  at  any  time  is  bounded  by  a constant 
times  the  number  of  items  currently  present,  plus  another  additive  constant. 
This  result,  which  builds  on  ideas  of  Fredman,  Komlos,  and  Szemeredi  [JACM 
31  (1984),  538-544],  is  due  to  Dietzfelbinger,  Karlin,  Mehlhorn,  Meyer  auf  der 
Heide,  Rohnert,  and  Tarjan  [SICOMP  23  (1994),  738-761]. 

EXERCISES 

1.  [20]  When  the  instruction  9H  in  Table  1 is  reached,  how  small  and  how  large  can 
the  contents  of  rll  possibly  be,  assuming  that  bytes  1,  2,  3 of  K each  contain  alphabetic 
character  codes  less  than  30? 

2.  [20]  Find  a reasonably  common  English  word  not  in  Table  1 that  could  be  added 
to  that  table  without  changing  the  program. 

3.  [23]  Explain  why  no  program  beginning  with  the  five  instructions 

LD1  K ( 1 : 1 ) or  LD1N  K(l:l) 

LD2  K(2:2)  or  LD2N  K(2:2) 

INC1  a, 2 
LD2  K(3:3) 

J2Z  9F 

could  be  used  in  place  of  the  more  complicated  program  in  Table  1,  for  any  constant  a, 
since  unique  addresses  would  not  be  produced  for  the  given  keys. 

4.  [M30]  How  many  people  should  be  invited  to  a party  in  order  to  make  it  likely 
that  there  are  three  with  the  same  birthday? 

5.  [15]  Mr.  B.  C.  Dull  was  writing  a FORTRAN  compiler  using  a decimal  MIX  com- 
puter, and  he  needed  a symbol  table  to  keep  track  of  the  names  of  variables  in  the 
FORTRAN  program  being  compiled.  These  names  were  restricted  to  be  at  most  ten 
characters  in  length.  He  decided  to  use  a hash  table  with  M = 100,  and  to  use  the  fast 
hash  function  h(K)  = leftmost  byte  of  K.  Was  this  a good  idea? 

6.  [15]  Would  it  be  wise  to  change  the  first  two  instructions  of  (3)  to  LDA  K;  ENTX  0? 


I1H 


550  SEARCHING  6.4 

jj  ' 

7.  [HM30]  ( Polynomial  hashing .)  The  purpose  of  this  exercise  is  to  consider  the 
construction  of  polynomials  P(x)  such  as  (10),  which  convert  n-bit  keys  into  m-bit 
addresses,  in  such  a way  that  distinct  keys  differing  in  t or  fewer  bits  will  hash  to 
different  addresses.  Given  n and  t < n,  and  given  an  integer  k such  that  n divides 
2fc  - 1,  we  shall  construct  a polynomial  whose  degree  m is  a function  of  n,  t,  and  k. 
(Usually  n is  increased,  if  necessary,  so  that  k can  be  chosen  to  be  reasonably  small.) 

Let  S be  the  smallest  set  of  integers  such  that  {1,  2, . . . , t)  C S and  (2 j)  mod  n € S 
for  all  j G S.  For  example,  when  n — 15,  k = 4,  and  t = 6,  we  have  S = {1,2,3, 4, 
5,6,8,10,12,9}.  We  now  define  the  polynomial  P(x ) = - cP ) , where  a is  an 

element  of  order  n in  the  finite  field  GF(2fc),  and  where  the  coefficients  of  P(x)  are 
computed  in  this  field.  The  degree  m of  P(x)  is  the  number  of  elements  of  S.  Since 
a23  is  a root  of  P(x)  whenever  a1  is  a root,  it  follows  that  the  coefficients  p,  of  P(x) 
satisfy  p2  = pz , so  they  are  0 or  1. 

Prove  that  if  R(x)  = rn-ixn  1 + ■ • - + rix+ro  is  any  nonzero  polynomial  modulo  2, 
with  at  most  t nonzero  coefficients,  then  R(x)  is  not  a multiple  of  P(x)  modulo  2. 
[It  follows  that  the  corresponding  hash  function  behaves  as  advertised.] 

8.  [MS 4 ] ( The  three- distance  theorem.)  Let  6 be  an  irrational  number  between  0 
and  1,  whose  regular  continued  fraction  representation  in  the  notation  of  Section  4.5.3 
is  9 = //oi,a2,a3,...//.  Let  q0  = 0,  p0  = 1,  qi  = 1,  pi  = 0,  and  qk+1  = akqk  + qk- 1, 
Pk+ 1 = akpk  + pk- 1 for  k > 1.  Let  {2}  denote  x mod  1 = x — |_xj,  and  let  {2}+ 
denote  x — [2]  + 1.  As  the  points  {0},  {28},  {30}, . . . are  successively  inserted  into  the 
interval  [0 . . 1],  let  the  line  segments  be  numbered  as  they  appear  in  such  a way  that  the 
first  segment  of  a given  length  is  number  0,  the  next  is  number  1,  etc.  Prove  that  the 
following  statements  are  all  true:  Interval  number  s of  length  {t0},  where  t = rqk+qk-\ 
and  0 < r < ak  and  k is  even  and  0 < s < qk,  has  left  endpoint  {s0}  and  right  endpoint 
{(s  + 1)6}+ . Interval  number  s of  length  1 - {t8},  where  t = rqk  + qk_  1 and  0 < r < ak 
and  k is  odd  and  0 < s < qk,  has  left  endpoint  {(s  + t)9}  and  right  endpoint  {.s0}  + . 
Every  positive  integer  n can  be  uniquely  represented  as  n = rqk  + qk~i  + s for  some 
k > l,  l < r < ak,  and  0 < s < qk.  In  terms  of  this  representation,  just  before  the 
point  {n0}  is  inserted  the  n intervals  present  are 

the  first  s intervals  (numbered  0,  . . . , s - 1)  of  length  {(  — l)fc(r-qrfc  + <jfe_i)0}; 

the  first  n — qk  intervals  (numbered  0,  . . . , n — qk  — 1)  of  length  { (— l)fc"t_1  6»} ; 

the  last  qk-s  intervals  (numbered  s,  . . . , qk~\)  of  length  {(-l)'c((r-l)gfc+5fc_1)0}+. 
The  operation  of  inserting  {n9}  removes  interval  number  s of  the  third  type  and 
converts  it  into  interval  number  s of  the  first  type,  number  n - qk  of  the  second  type. 

9.  [M30]  When  we  successively  insert  the  points  {9},  {20},  ...  into  the  interval 
[0..1],  Theorem  S asserts  that  each  new  point  always  breaks  up  one  of  the  largest 
remaining  intervals.  If  the  interval  [a..c]  is  thereby  broken  into  two  parts  [a.,  b], 
[6 . . c] , we  may  call  it  a bad  break  if  one  of  these  parts  is  more  than  twice  as  long  as  the 
other,  namely  if  b - a > 2(c  - b)  or  c - b > 2(6  - a). 

Prove  that  bad  breaks  will  occur  for  some  {n0}  unless  0 mod  1 = or  4>~2\  and 
the  latter  values  of  0 never  produce  bad  breaks. 

10.  [ M38 ] (R.  L.  Graham.)  If  8,a\, . . . ,ad  are  real  numbers  with  a\  = 0,  and  if 
ni, . . . ,na  are  positive  integers,  and  if  the  points  {n0  + aj}  are  inserted  into  the  interval 
[0 . . 1]  for  0 < n < nj  and  1 < j < d,  prove  that  the  resulting  rij  + • ■ • + n,i  (possibly 
empty)  intervals  have  at  most  3d  different  lengths. 

11.  [16]  Successful  searches  are  often  more  frequent  than  unsuccessful  ones.  Would 
it  therefore  be  a good  idea  to  interchange  lines  12-13  of  Program  C with  lines  10-11? 
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► 12.  [21]  Show  that  Program  C can  be  rewritten  so  that  there  is  only  one  conditional 
jump  instruction  in  the  inner  loop.  Compare  the  running  time  of  the  modified  program 
with  the  original. 

► 13.  [24]  ( Abbreviated  keys.)  Let  h(K)  be  a hash  function,  and  let  q(K)  be  a function 
of  K such  that  K can  be  determined  once  h(K)  and  q(K)  are  given.  For  example,  in 
division  hashing  we  may  let  h(K)  = K mod  M and  q(K)  = [K /M J ; in  multiplicative 
hashing  we  may  let  h(K)  be  the  leading  bits  of  ( AK/w ) mod  1,  and  q(K)  can  be  the 
other  bits. 

Show  that  when  chaining  is  used  without  overlapping  lists,  we  need  only  store  q(K) 
instead  of  K in  each  record.  (This  almost  saves  the  space  needed  for  the  link  fields.) 
Modify  Algorithm  C so  that  it  allows  such  abbreviated  keys  by  avoiding  overlapping 
lists,  yet  uses  no  auxiliary  storage  locations  for  overflow  records. 

14.  [24]  (E.  W.  Elcock.)  Show  that  it  is  possible  to  let  a large  hash  table  share 
memory  with  any  number  of  other  linked  lists.  Let  every  word  of  the  list  area  have  a 
2-bit  TAG  field  and  two  link  fields  called  LINK  and  AUX,  with  the  following  interpretation: 

TAG(P)  = 0 indicates  a word  in  the  list  of  available  space;  LINK(P)  points  to  the 
next  entry  in  this  list,  and  AUX(P)  is  unused. 

TAG(P)  = 1 indicates  a word  in  use  where  P is  not  the  hash  address  of  any  key  in 
the  hash  table;  the  other  fields  of  the  word  in  location  P may  have  any  desired 
format. 

TAG(P)  = 2 indicates  that  P is  the  hash  address  of  at  least  one  key;  AUX(P)  points 
to  a linked  list  specifying  all  such  keys,  and  LINK(P)  points  to  another  word 
in  the  list  memory.  Whenever  a word  with  TAG(P)  = 2 is  accessed  during  the 
processing  of  any  list,  we  set  P t—  LINK(P)  repeatedly  until  reaching  a word 
with  TAG(P)  < 1.  (For  efficiency  we  might  also  then  change  prior  links  so  that 
it  will  not  be  necessary  to  skip  over  the  same  entries  again  and  again.) 

Define  suitable  algorithms  for  inserting  and  retrieving  keys  in  such  a hash  table. 

15.  [16]  Why  is  it  a good  idea  for  Algorithm  L and  Algorithm  D to  signal  overflow 
when  N = M — 1 instead  of  when  N = M? 

16.  [10]  Program  L says  that  K should  not  be  zero.  But  doesn’t  it  actually  work 
even  when  K is  zero? 

17.  [15]  Why  not  simply  define  h2(K)  = h\(K ) in  (25),  when  hi(K)  ^ 0? 

► 18.  [21  ] Is  (31)  better  or  worse  than  (30),  as  a substitute  for  lines  10-13  of  Program  D? 
Give  your  answer  on  the  basis  of  the  average  values  of  A , 51,  and  C. 

19.  [40]  Empirically  test  the  effect  of  restricting  the  range  of  h2(K ) in  Algorithm  D, 
so  that  (a)  1 < h2(K)  < r for  r = 1, 2, 3, . . . , 10;  (b)  1 < h2(K)  < pM  for  p = 

2_  9_ 

10  > 10  > • ■ • ’ 10 ' 

20.  [M25]  (R.  Krutar.)  Change  Algorithm  D as  follows,  avoiding  the  hash  function 
h2(K):  In  step  D3,  set  c <—  0;  and  at  the  beginning  of  step  D4,  set  c <-  c + 1. 
Prove  that  if  M = 2m,  the  corresponding  probe  sequence  hi(K),  (h\(K)  — l)  mod  M, 
...,  (h\(K)  — (^))  mod  M will  be  a permutation  of  {0, 1, . . . , M — 1}.  When  this 
“quadratic  probing”  method  is  programmed  for  MIX,  how  does  it  compare  with  the 
three  programs  considered  in  Fig.  42,  assuming  that  the  algorithm  behaves  like  random 
probing  with  secondary  clustering? 
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► 21.  [20]  Suppose  that  we  wish  to  delete  a record  from  a table  constructed  by  Algo- 
rithm D,  marking  it  “deleted”  as  suggested  in  the  text.  Should  we  also  decrease  the 
variable  N that  is  used  to  govern  Algorithm  D? 

22.  [27]  Prove  that  Algorithm  R leaves  the  table  exactly  as  it  would  have  been  if 
KEY  [i]  had  never  been  inserted  in  the  first  place. 

► 23.  [33]  Design  an  algorithm  analogous  to  Algorithm  R,  for  deleting  entries  from  a 
chained  hash  table  that  has  been  constructed  by  Algorithm  C. 

24.  [ M20 ] Suppose  that  the  set  of  all  possible  keys  that  can  occur  has  MP  elements, 
where  exactly  P keys  hash  to  any  given  address.  (In  practical  cases,  P is  very  large;  for 
example,  if  the  keys  are  arbitrary  10-digit  numbers  and  if  M = 103,  we  have  P = 107.) 
Assume  that  M > 7 and  N = 7.  If  seven  distinct  keys  are  selected  at  random  from  the 
set  of  all  possible  keys,  what  is  the  exact  probability  that  the  hash  sequence  1 2 6 2 1 6 1 
will  be  obtained  (namely  that  h(K i)  = 1,  h(K2)  = 2,  . . . , h(K7)  = 1),  as  a function  of 
M and  P? 

25.  [M19]  Explain  why  Eq.  (39)  is  true. 

26.  [ M20 ] How  many  hash  sequences  Oi  o2  . . . ag  yield  the  pattern  of  occupied  cells 
(21),  using  linear  probing? 

27.  [M27]  Complete  the  proof  of  Theorem  K.  [Hint:  Let 

s(n, x,y)  = J2{nk)(x  + - *0n“fe_1(2 / - n); 

k 

use  Abel’s  binomial  theorem,  Eq.  1.2.6-(i6),  to  prove  that  s(n,x,y)  = x(x  + y)n  + 
ns(n- 1,  x+1,  j/-l).] 

28.  [ M30 ] In  the  old  days  when  computers  were  much  slower  than  they  are  now,  it 
was  possible  to  watch  the  lights  flashing  and  see  how  fast  Algorithm  L was  running. 
When  the  table  began  to  fill  up,  some  entries  would  be  processed  very  quickly,  while 
others  took  a great  deal  of  time. 

This  experience  suggests  that  the  standard  deviation  of  the  number  of  probes  in 
an  unsuccessful  search  is  rather  high,  when  linear  probing  is  used.  Find  a formula  that 
expresses  the  variance  in  terms  of  the  Qr  functions  defined  in  Theorem  K,  and  estimate 
the  variance  when  N = aM  as  M -4  00. 

29.  [M21]  ( The  parking  problem.)  A certain  one-way  street  has  m parking  spaces  in 
a row,  numbered  1 through  m.  A man  and  his  dozing  wife  drive  by,  and  suddenly  she 
wakes  up  and  orders  him  to  park  immediately.  He  dutifully  parks  at  the  first  available 
space;  but  if  there  are  no  places  left  that  he  can  get  to  without  backing  up  (that  is,  if 
his  wife  awoke  when  the  car  approached  space  k,  but  spaces  k,  k + 1,  . . . , m are  all 
full),  he  expresses  his  regrets  and  drives  on. 

Suppose,  in  fact,  that  this  happens  for  n different  cars,  where  the  jth  wife  wakes 
up  just  in  time  to  park  at  space  aj . In  how  many  of  the  sequences  a7. . . an  will  all  of 
the  cars  get  safely  parked,  assuming  that  the  street  is  initially  empty  and  that  nobody 
leaves  after  parking?  For  example,  when  m = n = 9 and  ai.. .09  = 31415926  5, 
the  cars  get  parked  as  follows: 

•0=2 

[Hint:  Use  the  analysis  of  linear  probing.] 
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30.  [M38]  When  n = m in  the  parking  problem  of  exercise  29,  show  that  all  cars  get 
parked  if  and  only  if  there  exists  a permutation  p\P2  ■ ■ - Pn  of  {1,2, . . . , n}  such  that 
ai  < pj  for  all  j. 

31.  [M40]  When  n = m in  the  parking  problem  of  exercise  29,  the  number  of  solutions 
turns  out  to  be  (n  + l)n_1;  and  from  exercise  2.3.4.4-22  we  know  that  this  is  the  same 
as  the  number  of  free  trees  on  n + 1 labeled  vertices!  Find  an  interesting  connection 
between  parking  sequences  and  trees. 

32.  [ M27 } Prove  that  the  system  of  equations  (44)  has  a unique  solution  (co,  ci, . . . , 
cm-  1),  whenever  60, 61,  • • • , 6m -1  are  nonnegative  integers  whose  sum  is  less  than  M. 
Design  an  algorithm  to  find  that  solution. 

► 33.  [ M23 ] Explain  why  (51)  is  only  an  approximation  to  the  true  average  number  of 
probes  made  by  Algorithm  L.  What  was  there  about  the  derivation  of  (51)  that  wasn’t 
rigorously  exact? 

► 34.  [ M23 ] The  purpose  of  this  exercise  is  to  investigate  the  average  number  of  probes 
in  a chained  hash  table  when  the  lists  are  kept  separate  as  in  Fig.  38. 

a)  What  is  Pjvk,  the  probability  that  a given  list  has  length  k,  when  the  MN  hash 
sequences  (35)  are  equally  likely? 

b)  Find  the  generating  function  Pn(z ) = Ylk>o  PNkZk  ■ 

c)  Express  the  average  number  of  probes  for  a successful  search  in  terms  of  this 
generating  function. 

d)  Deduce  the  average  number  of  probes  in  an  unsuccessful  search,  considering  vari- 
ants of  the  data  structure  in  which  the  following  conventions  are  used:  (i)  hashing 
is  always  to  a list  head  (see  Fig.  38);  (ii)  hashing  is  to  a table  position  (see  Fig.  40), 
but  all  keys  except  the  first  of  a list  go  into  a separate  overflow  area;  (iii)  hashing 
is  to  a table  position  and  all  entries  appear  in  the  hash  table. 

35.  [M24]  Continuing  exercise  34,  what  is  the  average  number  of  probes  in  an  unsuc- 
cessful search  when  the  individual  lists  are  kept  in  order  by  their  key  values?  Consider 
data  structures  (i),  (ii),  and  (iii). 

36.  [M23]  Continuing  exercise  34(d),  find  the  variance  of  the  number  of  probes  when 
the  search  is  unsuccessful,  using  data  structures  (i)  and  (ii). 

► 37.  [ M29 ] Equation  (19)  gives  the  average  number  of  probes  in  separate  chaining 
when  the  search  is  successful;  what  is  the  variance  of  that  number  of  probes? 

38.  [M32]  (Tree  hashing.)  A clever  programmer  might  try  to  use  binary  search  trees 
instead  of  linear  lists  in  the  chaining  method,  thereby  combining  Algorithm  6.2.2T 
with  hashing.  Analyze  the  average  number  of  probes  that  would  be  required  by 
this  compound  algorithm,  for  both  successful  and  unsuccessful  searches.  [Hint:  See 
Eq.  5.2.1— (15).] 

39.  [ M28 ] Let  c.v(fc)  be  the  total  number  of  lists  of  length  k formed  when  Algorithm  C 
is  applied  to  all  MN  hash  sequences  (35).  Find  a recurrence  relation  on  the  numbers 
cjv(fc)  that  makes  it  possible  to  determine  a simple  formula  for  the  sum 

sN  = y^W). 

How  is  Sn  related  to  the  number  of  probes  in  an  unsuccessful  search  by  Algorithm  C? 

40.  [M33]  Equation  (15)  gives  the  average  number  of  probes  used  by  Algorithm  C in 
an  unsuccessful  search;  what  is  the  variance  of  that  number  of  probes? 
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41.  [M40]  Analyze  Tjv,  the  average  number  of  times  the  index  R is  decreased  by  1 
when  the  (N  + l)st  item  is  being  inserted  by  Algorithm  C. 

► 42.  [M20]  Derive  (17),  the  probability  that  Algorithm  C succeeds  immediately. 

43.  [HM44]  Analyze  a modification  of  Algorithm  C that  uses  a table  of  size  M'  > M. 
Only  the  first  M locations  are  used  for  hashing,  so  the  first  M'  — M empty  nodes  found 
in  step  C5  will  be  in  the  extra  locations  of  the  table.  For  fixed  M',  what  choice  of  M 
in  the  range  1 < M < M'  leads  to  the  best  performance? 

44.  [ M43 ] ( Random  probing  with  secondary  clustering.)  The  object  of  this  exercise  is 
to  determine  the  expected  number  of  probes  in  the  open  addressing  scheme  with  probe 
sequence 

h(K),  (h(K)  + pi)  mod  M,  (h(K)  + p2)  mod  M,  ...,  (h(K)  + pm-i)  mod  M, 

where  pi  p2  • ■ • Pm  - 1 is  a randomly  chosen  permutation  of  { 1 , 2 , . . . , M — 1 } that  depends 
on  h(K).  In  other  words,  all  keys  with  the  same  value  of  h(K)  follow  the  same  probe 
sequence,  and  the  (M  — 1)!M  possible  choices  of  M probe  sequences  with  this  property 
are  equally  likely. 

This  situation  can  be  modeled  accurately  by  the  following  experimental  procedure 
performed  on  an  initially  empty  linear  array  of  size  m.  Do  the  following  operation  n 
times:  “With  probability  p,  occupy  the  leftmost  empty  position.  Otherwise  (that  is, 
with  probability  q = 1 — p) , select  any  table  position  except  the  one  at  the  extreme 
left,  with  each  of  these  m — 1 positions  equally  likely.  If  the  selected  position  is  empty, 
occupy  it;  otherwise  select  any  empty  position  (including  the  leftmost)  and  occupy  it, 
considering  each  of  the  empty  positions  equally  likely.” 

For  example,  when  m = 5 and  n = 3,  the  array  configuration  after  such  an 
experiment  will  be  (occupied,  occupied,  empty,  occupied,  empty)  with  probability 

T&m  + \pqq  + \<m  + £qqp  + \vpq  + \pqp  + \qpp- 

(This  procedure  corresponds  to  random  probing  with  secondary  clustering,  when  p = 
1/m,  since  we  can  renumber  the  table  entries  so  that  a particular  probe  sequence  is  0, 
1,  2,  ...  and  all  the  others  are  random.) 

Find  a formula  for  the  average  number  of  occupied  positions  at  the  left  of  the 
array  (namely  2 in  the  example  above).  Also  find  the  asymptotic  value  of  this  quantity 
when  p = 1/m,  n — a(m  + 1),  and  m->oo. 

45.  [M43]  Solve  the  analog  of  exercise  44  with  tertiary  clustering,  when  the  probe 
sequence  begins  hi(K),  ((hi(K)  + h2{K))  mod  M,  and  the  succeeding  probes  are  ran- 
domly chosen  depending  only  on  hi(K)  and  h2{K).  (Thus  the  (M-2)!m(m-1)  possible 
choices  of  M(M  — 1)  probe  sequences  with  this  property  are  considered  to  be  equally 
likely.)  Is  this  procedure  asymptotically  equivalent  to  uniform  probing? 

46.  [M42\  Determine  C'N  and  C N for  the  open  addressing  method  that  uses  the  probe 
sequence 

h{K),  0,  1,  ...,  h(K)~  1,  h(K)  + 1,  ...,  M-l. 

47.  [M25]  Find  the  average  number  of  probes  needed  by  open  addressing  when  the 
probe  sequence  is 

h{K),  h(K)~  1,  h{K)  + 1,  h(K)-2,  h(K)  + 2,  .... 

This  probe  sequence  was  once  suggested  because  all  the  distances  between  consecutive 
probes  are  distinct  when  M is  even.  [Hint:  Find  the  trick  and  this  problem  is  easy.] 
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► 48.  [M21]  Analyze  the  open  addressing  method  that  probes  locations  hi(K),  h-iiK), 
h,3(K), . . . , given  an  infinite  sequence  of  mutually  independent  random  hash  functions 
( hn(K )).  In  this  setup  it  is  possible  to  probe  the  same  location  twice,  for  example  if 
hi(K)  = h-iiK),  but  such  coincidences  are  rather  unlikely  until  the  table  gets  full. 

49.  [ HM24 } Generalizing  exercise  34  to  the  case  of  6 records  per  bucket,  determine  the 
average  number  of  probes  (external  memory  accesses)  CN  and  C'N,  for  chaining  with 
separate  lists,  assuming  that  a list  containing  k elements  requires  max(l,  k — b + 1) 
probes  in  an  unsuccessful  search.  Instead  of  using  the  exact  probability  Pjvk  as  in 
exercise  34,  use  the  Poisson  approximation 

(N\(1\k(1  i \N~k  N N -1  N-k  + ir  1 / 1V*1 

V kJKMJV  MJ  ~ M M M V~m)  V^m)  fc! 

= + 0(k2/M)), 

which  is  valid  for  N = pM  and  k < VM  as  M — t oo;  derive  formulas  (57)  and  (58). 

50.  [ M20 ] Show  that  Qi(M,  N)  = — 1)Q0(M,  N),  in  the  notation  of  (42). 

[Hint:  Prove  first  that  Q\{M,N ) — (N  + l)Qo(M,  N)  — NQo(M,N—l).] 

51.  [HM1 7]  Express  the  function  R(a,n)  defined  in  (55)  in  terms  of  the  function  Q0 
defined  in  (42). 

52.  [ HM20 } Prove  that  Q0(M,  N)  = /0°°  e~((l  + t/M)N dt. 

53.  [HM20]  Prove  that  the  function  R(a,  n)  can  be  expressed  in  terms  of  the  incom- 
plete gamma  function,  and  use  the  result  of  exercise  1.2.11.3-9  to  find  the  asymptotic 
value  of  R(a,n)  to  0(n~2)  as  n — t 00,  for  fixed  a < 1. 

54.  [ HM28 ] Show  that  when  6=1,  Eq.  (61)  is  equivalent  to  Eq.  (23).  Hint:  We  have 

t M - (~1)n~1  V (-"°0m 

n\  a m(m  — l)(m  — n — 1)! 

' ' ' ' 


55.  [HM43]  Generalize  the  Schay-Spruth  model,  discussed  after  Theorem  P,  to  the 
case  of  M buckets  of  size  6.  Prove  that  C{z)  is  equal  to  Q(z) / (B(z)  — zb),  where  Q(z) 
is  a polynomial  of  degree  6 and  Q(  1)  = 0.  Show  that  the  average  number  of  probes  is 


1 

1 - qb- 1 


1 B"(l)  — 6(6  — 1)  \ 

2 B'(l)  — b ) ’ 


where  qi,  . . . , qb-i  are  the  roots  of  Q(z)/(z  — 1).  Replacing  the  binomial  probability 
distribution  B(z ) by  the  Poisson  approximation  P(z)  = e6a(*-1\  where  a = N/Mb, 
and  using  Lagrange’s  inversion  formula  (see  Eq.  2.3.4.4-(2i)  and  exercise  4.7-8),  reduce 
your  answer  to  Eq.  (61). 

56.  [HM43]  Generalize  Theorem  K,  obtaining  an  exact  analysis  of  linear  probing  with 
buckets  of  size  6.  What  is  the  asymptotic  number  of  probes  in  a successful  search  when 
the  table  is  full  ( N = Mb )? 

57.  [M47]  Does  the  uniform  assignment  of  probabilities  to  probe  sequences  give  the 
minimum  value  of  Cat,  over  all  open  addressing  methods? 

58.  [M21]  (S.  C.  Johnson.)  Find  ten  permutations  on  {0, 1, 2,3,4}  that  are  equivalent 
to  uniform  probing  in  the  sense  of  Theorem  U. 
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59.  [M25]  Prove  that  if  an  assignment  of  probabilities  to  permutations  is  equivalent  to 
uniform  probing,  in  the  sense  of  Theorem  U,  the  number  of  permutations  with  nonzero 
probabilities  exceeds  Ma  for  any  fixed  exponent  a,  when  M is  sufficiently  large. 

60.  [M47]  Let  us  say  that  an  open  addressing  scheme  involves  single  hashing  if  it  uses 
exactly  M probe  sequences,  one  beginning  with  each  possible  value  of  h(K),  each  of 
which  occurs  with  probability  1/M. 

Are  the  best  single-hashing  schemes  (in  the  sense  of  minimum  Cat)  asymptotically 
better  than  the  random  ones  described  by  (29)?  In  particular,  is  C«m  > 1 + + 

§a2  + 0(a3)  asM->  00? 

61.  [MJt6]  Is  the  method  analyzed  in  exercise  46  the  worst  possible  single-hashing 
scheme,  in  the  sense  of  exercise  60? 

62.  [M49]  A single  hashing  scheme  is  called  cyclic  if  the  increments  pi  p2  . . . pm  - 1 in 
the  notation  of  exercise  44  are  fixed  for  all  K.  (Examples  of  such  methods  are  linear 
probing  and  the  sequences  considered  in  exercises  20  and  47.)  An  optimum  single 
hashing  scheme  is  one  for  which  Cm  is  minimum,  over  all  (M  - 1)!M  single  hashing 
schemes  for  a given  M . When  M < 5 the  best  single  hashing  schemes  are  cyclic.  Is 
this  true  for  all  M? 

63.  [M25]  If  repeated  random  insertions  and  deletions  are  made  in  a hash  table,  how 
many  independent  insertions  are  needed  on  the  average  before  all  M locations  have 
become  occupied  at  one  time  or  another?  (This  is  the  mean  time  to  failure  of  the 
deletion  method  that  simply  marks  cells  “deleted.”) 

64.  [M41]  Analyze  the  expected  behavior  of  Algorithm  R (deletion  with  linear  prob- 
ing). How  many  times  will  step  R4  be  performed,  on  the  average? 

► 65.  [20]  ( Variable-length  keys.)  Many  applications  of  hash  tables  deal  with  keys  that 
can  be  any  number  of  characters  long.  In  such  cases  we  can’t  simply  store  the  key  in 
the  table  as  in  the  programs  of  this  section.  What  would  be  a good  way  to  deal  with 
variable-length  keys  in  a hash  table  on  the  MIX  computer? 

► 66.  [25]  (Ole  Amble,  1973.)  Is  it  possible  to  insert  keys  into  an  open  hash  table  mak- 
ing use  also  of  their  numerical  or  alphabetic  order,  so  that  a search  with  Algorithm  L 
or  Algorithm  D is  known  to  be  unsuccessful  whenever  a key  smaller  than  the  search 
argument  is  encountered? 

67.  [M41]  If  Algorithm  L inserts  N keys  with  respective  hash  addresses  ai  a2  ...  on, 
let  dj  be  the  displacement  of  the  j th  key  from  its  home  address  ; then  Cn  = 1 + 
(di  + d2  + ■ ■ ■ + dN)/N.  Theorem  P tells  us  that  permutation  of  the  a’s  has  no  effect 
on  the  sum  di  + d2  + • ■ • + On-  However,  such  permutation  might  drastically  change 
the  sum  d\  + df  + • • • + d%.  For  example,  the  hash  sequence  12  ...  N—l  N—l 
makes  d\  d2  ...  d/v_i  rl.v  = 00  ...  0 N—l  and  J/) = (N  — l)2,  while  its  reflection 
N—l  N—l  ...  2 1 leads  to  much  more  civilized  displacements  01  ...  1 1 for  which 
Ed]=N-l. 

a)  Which  rearrangement  of  ai  a2  ...  a/f  minimizes  d 2? 

b)  Explain  how  to  modify  Algorithm  L so  that  it  maintains  a least-variance  set  of 
displacements  after  every  insertion. 

c)  Determine  the  average  value  of  ^ d]  with  and  without  this  modification. 

68.  [M41]  What  is  the  variance  of  the  average  number  of  probes  in  a successful  search 

by  Algorithm  L?  In  particular,  what  is  the  average  of  (di+d2H \-dN)2  in  the  notation 

of  exercise  67? 
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69.  [M25]  (Andrew  Yao.)  Prove  that  all  cyclic  single  hashing  schemes  in  the  sense 
of  exercise  62  satisfy  the  inequality  C'aM  > |(1  + 1/(1  — a)).  [Hint:  Show  that  an 
unsuccessful  search  takes  exactly  k probes  with  probability  pk  < (M  — N) / M .] 

70.  [HM43]  Prove  that  the  expected  number  of  probes  that  are  needed  to  insert  the 
(aM  + l)st  item  with  double  hashing  is  at  most  the  expected  number  needed  to  insert 
the  (aM  + 0(log  M)/M  )th  item  with  uniform  probing. 

71.  [40]  Experiment  with  the  behavior  of  Algorithm  C when  it  has  been  adapted  to 
external  searching  as  described  in  the  text. 

► 72.  [M28]  ( Universal  hashing.)  Imagine  a gigantic  matrix  H that  has  one  column  for 
every  possible  key  K.  The  entries  of  H are  numbers  between  0 and  M — 1;  the  rows  of  H 
represent  hash  functions.  We  say  that  H defines  a universal  family  of  hash  functions 
if  any  two  columns  agree  in  at  most  R/M  rows,  where  R is  the  total  number  of  rows. 

a)  Prove  that  if  H is  universal  in  this  sense,  and  if  we  select  a hash  function  h by 
choosing  a row  of  H at  random,  then  the  expected  size  of  the  list  containing  any 
given  key  K in  the  method  of  separate  chaining  (Fig.  38)  will  be  < 1 + N/M,  after 
we  have  inserted  any  set  of  N distinct  keys  K\,  K?,  . . . , Kn. 

b)  Suppose  each  hj  in  (9)  is  a randomly  chosen  mapping  from  the  set  of  all  characters 
to  the  set  {0, 1, . . . , M — 1}.  Show  that  this  corresponds  to  a universal  family  of 
hash  functions. 

c)  Would  the  result  of  (b)  still  be  true  if  hj( 0)  = 0 for  all  j,  but  hj(x)  is  random  for 
x ^ 0? 

73.  [M26]  (Carter  and  Wegman.)  Show  that  part  (b)  of  the  previous  exercise  holds 

even  when  the  hj  are  not  completely  random  functions,  but  they  have  either  of  the 
following  special  forms:  (i)  Let  Xj  be  the  binary  number  (fcj(n-i)  •••  ^1^0)2-  Then 
hj(xj)  = (aj(n-x)bj(n-i)  + •••  + ajibji  -f  ajobjo)  mod  M,  where  each  a}k  is  chosen 

randomly  modulo  M.  (ii)  Let  M be  prime  and  assume  that  0 < Xj  < M.  Then 

hj(xj)  = (ajXj  + bj)  mod  M,  where  aj  and  bj  are  chosen  randomly  modulo  M. 

74.  [ M29 ] Let  H define  a universal  family  of  hash  functions.  Prove  or  disprove:  Given 
any  N distinct  columns,  and  any  row  chosen  at  random,  the  expected  number  of  zeros  in 
those  columns  is  0(1)  + 0(N/M).  [Thus,  every  list  in  the  method  of  separate  chaining 
will  have  this  expected  size.] 

75.  [M26]  Prove  or  disprove  the  following  statements  about  the  hash  function  h of  (9), 
when  the  hj  are  independent  random  functions: 

a)  The  probability  that  h(K)  = m is  1/M,  for  all  0 < m < M. 

b)  If  K ^ K' , the  probability  that  h(K)  = m and  h(K')  = m'  is  1/M2,  for  all 

0 <m,m'<  M. 

c)  If  K,  K',  and  K"  are  distinct,  the  probability  that  h(K)  = m,  h(K')  = m' , and 
h(K")  = m"  is  1/M3 , for  all  0 < m,m! ,m"  < M. 

d)  If  K,  K',  K",  and  K'"  are  distinct,  the  probability  that  h(K)  = m,  h(K')  = m' , 
h(K")  = m" , and  h(K'")  = m'"  is  1/M4,  for  all  0 < m,m! ,m" ,m'"  < M. 

► 76.  [ M21 ] Suggest  a way  to  modify  (9)  for  keys  with  variable  length,  preserving  the 
properties  of  universal  hashing. 

77.  [M22]  Let  H define  a universal  family  of  hash  functions  from  32-bit  keys  to  16-bit 
keys.  (Thus  H has  232  columns,  and  M — 2 , in  the  notation  of  exercise  72.)  A 256-bit 
key  can  be  regarded  as  the  concatenation  of  eight  32-bit  parts  £1X2 £3 £4*5X6 £7X8;  we 


558  SEARCHING 


6.4 


can  map  it  into  a 16-bit  address  with  the  hash  function 

h4  (/i3  (/i2  {h±  (xi)hi(x2))h2{h1(x3)hi(x4)))h3(h2{hi(x5)hi(x6))h2{hi(xT)hi(x8)))), 

where  hi,  /12,  h3,  and  h4  are  randomly  and  independently  chosen  rows  of  H . (Here,  for 
example,  hi(xi)hi  {12)  stands  for  the  32-bit  number  obtained  by  concatenating  hi(xi) 
with  h\(x2)-)  Prove  that  the  probability  is  less  than  2~14  that  two  distinct  keys  hash  to 
the  same  address.  [This  scheme  requires  substantially  fewer  random  choices  than  (9).] 
► 78.  [M26]  (P.  Woelfel.)  If  0 < x < 2",  let  ha,b{x)  = [(ax  + 6)/2*J  mod2n_':.  Show 
that  the  set  {ha,b  | 0 < a < 2n,  a odd,  and  0 < b < 2k}  is  a universal  family  of  hash 
functions  from  n-bit  keys  to  (n  — A;)-bit  keys.  (These  functions  are  particularly  easy  to 
implement  on  a binary  computer.) 


She  made  a hash  of  the  proper  names,  to  be  sure. 
— GRANT  ALLEN,  The  Tents  of  Shem  (1889) 

HASH,  x.  There  is  no  definition 
for  this  word  — 
nobody  knows  what  hash  is. 
— AMBROSE  BIERCE,  The  Devil's  Dictionary  (1906) 
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6.5.  RETRIEVAL  ON  SECONDARY  KEYS 

We  HAVE  NOW  COMPLETED  our  study  of  searching  for  primary  keys,  namely  for 
keys  that  uniquely  specify  a record  in  a file.  But  it  is  sometimes  necessary  to 
conduct  a search  based  on  the  values  of  other  fields  in  the  records  besides  the 
primary  key;  these  other  fields  are  often  called  secondary  keys  or  attributes  of 
the  record.  For  example,  in  an  enrollment  file  that  contains  information  about 
the  students  at  a university,  it  may  be  desirable  to  search  for  all  sophomores 
from  Ohio  who  are  not  majoring  in  mathematics  or  statistics;  or  to  search  for 
all  unmarried  French-speaking  graduate  student  women;  etc. 

In  general,  we  assume  that  each  record  contains  several  attributes,  and  we 
want  to  search  for  all  records  that  have  certain  values  of  certain  attributes.  The 
specification  of  the  desired  records  is  called  a query.  Queries  are  usually  restricted 
to  at  most  the  following  three  types: 

a)  A simple  query  that  gives  a specific  value  of  a specific  attribute;  for  example, 

“MAJOR  = MATHEMATICS”,  or  “RESIDENCE. STATE  = OHIO”. 

b)  A range  query  that  gives  a specific  range  of  values  for  a specific  attribute; 
for  example,  “COST  < $18.00”,  or  “21  < AGE  < 23”. 

c)  A Boolean  query  that  consists  of  the  previous  types  combined  with  the 
operations  AND,  OR,  NOT;  for  example, 

“(CLASS  = SOPHOMORE)  AND  (RESIDENCE . STATE  = OHIO) 

AND  NOT  ((MAJOR  = MATHEMATICS)  OR  (MAJOR  = STATISTICS))”. 

The  problem  of  discovering  efficient  search  techniques  for  these  three  types  of 
queries  is  already  quite  difficult,  and  therefore  queries  of  more  complicated  types 
are  usually  not  considered.  For  example,  a railroad  company  might  have  a file 
giving  the  current  status  of  all  its  freight  cars;  a query  such  as  “find  all  empty 
refrigerator  cars  within  500  miles  of  Seattle”  would  not  be  explicitly  allowed, 
unless  “distance  from  Seattle”  were  an  attribute  stored  within  each  record  instead 
of  a complicated  function  to  be  deduced  from  other  attributes.  And  the  use  of 
logical  quantifiers,  in  addition  to  AND,  OR,  and  NOT,  would  introduce  further 
complications,  limited  only  by  the  imagination  of  the  query-poser;  given  a file  of 
baseball  statistics,  for  example,  we  might  ask  for  the  longest  consecutive  hitting 
streak  in  night  games.  These  examples  are  complicated,  but  they  can  still  be 
handled  by  taking  one  pass  through  a suitably  arranged  file.  Other  queries 
are  even  more  difficult  — for  example,  to  find  all  pairs  of  records  that  have  the 
same  values  on  five  or  more  attributes  (without  specifying  which  attributes  must 
match).  Such  queries  may  be  regarded  as  general  programming  tasks  that  are 
beyond  the  scope  of  this  discussion,  although  they  can  often  be  broken  down 
into  subproblems  of  the  kind  considered  here. 

Before  we  begin  to  study  the  various  techniques  for  secondary  key  retrieval, 
it  is  important  to  put  the  subject  in  a proper  economic  context.  Although  a 
vast  number  of  applications  fit  into  the  general  framework  of  the  three  types  of 
queries  outlined  above,  not  many  of  these  applications  are  really  suited  to  the 
sophisticated  techniques  we  shall  be  studying,  and  some  of  them  are  better  done 
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by  hand  than  by  machine!  People  climb  Mt.  Everest  “because  it  is  there”  and 
because  tools  have  been  developed  that  make  the  climb  possible;  similarly,  when 
faced  with  a mountain  of  data,  people  are  tempted  to  use  a computer  to  find  the 
answer  to  the  most  difficult  queries  they  can  dream  up,  in  an  online  real-time 
environment,  without  properly  balancing  the  cost.  The  desired  calculations  are 
possible,  but  they’re  not  right  for  everyone’s  application. 

For  example,  consider  the  following  simple  approach  to  secondary  key  re- 
trieval: After  batching  a number  of  queries,  we  can  do  a sequential  search  through 
the  entire  file,  retrieving  all  the  relevant  records.  (“Batching”  means  that  we 
accumulate  a number  of  queries  before  doing  anything  about  them.)  This  method 
is  quite  satisfactory  if  the  file  isn’t  too  large  and  if  the  queries  don’t  have  to  be 
handled  immediately.  It  can  be  used  even  with  tape  files,  and  it  only  ties  up 
the  computer  at  odd  intervals,  so  it  will  tend  to  be  very  economical  in  terms 
of  equipment  costs.  Moreover,  it  will  even  handle  computational  queries  of  the 
“distance  to  Seattle”  type  discussed  above. 

Another  simple  way  to  facilitate  secondary  key  retrieval  is  to  let  people 
do  part  of  the  work,  by  providing  them  with  suitable  printed  indexes  to  the 
information.  This  method  is  often  the  most  reasonable  and  economical  way  to 
proceed  (provided,  of  course,  that  the  old  paper  is  recycled  whenever  a new  index 
is  printed),  especially  because  people  tend  to  notice  interesting  patterns  when 
they  have  convenient  access  to  masses  of  data. 

The  applications  that  are  not  satisfactorily  handled  by  the  simple  schemes 
given  above  involve  very  large  files  for  which  quick  responses  to  queries  are  im- 
portant. Such  a situation  would  occur,  for  example,  if  the  file  were  continuously 
being  queried  by  a number  of  simultaneous  users,  or  if  the  queries  were  being 
generated  by  machine  instead  of  by  people.  Our  goal  in  this  section  will  be  to 
see  how  well  we  can  do  secondary  key  retrieval  with  conventional  computers, 
under  various  assumptions  about  the  file  structure.  Fortunately,  the  methods 
we  will  discuss  are  becoming  more  and  more  feasible  in  practice,  as  the  cost  of 
computation  continues  to  decrease  dramatically. 

A lot  of  good  ideas  have  been  developed  for  dealing  with  the  problem,  but  (as 
the  reader  will  have  guessed  from  all  these  precautionary  remarks)  the  algorithms 
are  by  no  means  as  good  as  those  available  for  primary  key  retrieval.  Because  of 
the  wide  variety  of  files  and  applications,  we  will  not  be  able  to  give  a complete 
discussion  of  all  the  possibilities  that  have  been  considered,  or  to  analyze  the 
behavior  of  each  algorithm  in  typical  environments.  The  remainder  of  this 
section  presents  the  basic  approaches  that  have  been  proposed,  and  it  is  left 
to  the  reader’s  imagination  to  decide  what  combination  of  techniques  is  most 
appropriate  in  each  particular  case. 

Inverted  files.  The  first  important  class  of  techniques  for  secondary  key  re- 
trieval is  based  on  the  idea  of  an  inverted  file.  This  does  not  mean  that  the 
file  is  turned  upside  down;  it  means  that  the  roles  of  records  and  attributes  are 
reversed.  Instead  of  listing  the  attributes  of  a given  record,  we  list  the  records 
having  a given  attribute. 
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We  encounter  inverted  files  (under  other  names)  quite  often  in  our  daily  lives. 
For  example,  the  inverted  file  corresponding  to  a Russian-English  dictionary  is 
an  English-Russian  dictionary.  The  inverted  file  corresponding  to  this  book  is 
the  index  that  appears  at  the  close  of  the  book.  Accountants  traditionally  use 
“double-entry  bookkeeping,”  where  all  transactions  are  entered  both  in  a cash 
account  and  in  a customer  account,  so  that  the  current  cash  position  and  the 
current  customer  liability  are  both  readily  accessible. 

In  general,  an  inverted  file  usually  doesn’t  stand  by  itself;  it  is  to  be  used 
together  with  the  original  uninverted  file.  It  provides  duplicate,  redundant 
information  in  order  to  speed  up  secondary  key  retrieval.  The  components  of 
an  inverted  file  are  called  inverted  lists , namely  the  lists  of  all  records  having  a 
given  value  of  some  attribute. 

Like  all  lists,  the  inverted  lists  can  be  represented  in  many  ways  within 
a computer,  and  different  modes  of  representation  are  appropriate  at  different 
times.  Some  secondary  key  fields  have  only  two  values  (for  example,  “SEX”),  and 
the  corresponding  inverted  lists  are  quite  long;  but  other  fields  typically  have  a 
great  many  values  with  few  duplications  (for  example,  “PHONENUMBER” ) . 

Imagine  that  we  want  to  store  the  information  in  a telephone  directory  so 
that  all  entries  can  be  retrieved  on  the  basis  of  either  name,  phone  number,  or 
residence  address.  One  solution  is  simply  to  make  three  separate  files,  oriented 
to  retrieval  on  each  type  of  key.  Another  idea  is  to  combine  the  files,  for  example 
by  making  three  hash  tables  that  serve  as  the  list  heads  for  the  chaining  method. 
In  the  latter  scheme,  each  record  of  the  file  would  be  an  element  of  three  lists, 
and  it  would  therefore  contain  three  link  fields;  this  is  the  so-called  multilist 
method  illustrated  in  Fig.  13  of  Section  2.2.6  and  discussed  further  below.  A 
third  possibility  is  to  combine  the  three  files  into  one  super  file,  by  analogy  with 
library  card  catalogues  in  which  author  cards,  title  cards,  and  subject  cards  are 
all  alphabetized  together. 

A consideration  of  the  format  used  in  the  index  to  this  book  leads  to 
further  ideas  on  inverted  list  representation.  For  secondary  key  fields  in  which 
there  are  typically  five  or  so  entries  per  attribute  value,  we  can  simply  make 
a short  sequential  list  of  the  record  locations  (analogous  to  page  locations  in 
a book  index),  following  the  key  value.  If  related  records  tend  to  be  clustered 
consecutively,  a range  specification  code  (for  example,  pages  559-582)  is  useful. 
If  the  records  in  the  file  tend  to  be  reallocated  frequently,  it  may  be  better  to 
use  primary  keys  instead  of  record  locations  in  the  inverted  files,  so  that  no 
updating  needs  to  be  done  when  the  locations  change;  for  example,  references 
to  Bible  passages  are  always  given  by  chapter  and  verse,  and  the  index  to  some 
books  is  based  on  paragraph  numbers  instead  of  page  numbers. 

None  of  these  ideas  is  especially  appropriate  for  the  case  of  a two-valued 
attribute  like  “SEX” . In  such  a case  only  one  inverted  list  is  needed,  of  course, 
since  the  non-males  will  be  female  and  conversely.  If  each  value  relates  to  about 
half  the  items  of  the  file,  the  inverted  list  will  be  horribly  long,  but  we  can 
solve  the  problem  rather  nicely  on  a binary  computer  by  using  a bit  string 
representation,  with  each  bit  specifying  the  value  of  a particular  record.  Thus 
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the  bit  string  01001011101 . . . might  mean  that  the  first  record  in  the  file  refers 
to  a male,  the  second  female,  the  next  two  male,  etc. 

Such  methods  suffice  to  handle  simple  queries  about  specific  attribute  val- 
ues. A slight  extension  makes  it  possible  to  treat  range  queries,  except  that  a 
comparison-based  search  scheme  (Section  6.2)  must  be  used  instead  of  hashing. 

For  Boolean  queries  like  “(MAJOR  = MATHEMATICS)  AND  (RESIDENCE.  STATE 
= OHIO)”,  we  need  to  intersect  two  inverted  lists.  This  can  be  done  in  several 
ways;  for  example,  if  both  lists  are  ordered,  one  pass  through  each  will  pick  out 
all  common  entries.  Alternatively,  we  could  select  the  shortest  list  and  look  up 
each  of  its  records,  checking  the  other  attributes;  but  this  method  works  only 
for  AND’s,  not  for  OR’s,  and  it  is  unattractive  on  external  files  because  it  requires 
many  accesses  to  records  that  will  not  satisfy  the  query. 

The  same  considerations  show  that  a multilist  organization  as  described 
above  is  inefficient  for  Boolean  queries  on  an  external  file,  since  it  implies  many 
unnecessary  accesses.  For  example,  imagine  what  would  happen  if  the  index  to 
this  book  were  organized  in  a multilist  manner:  Each  entry  of  the  index  would 
refer  only  to  the  last  page  on  which  its  particular  subject  was  mentioned;  then 
on  every  page  there  would  be  a further  reference,  for  each  subject  on  that  page, 
to  the  previous  occurrence  of  that  subject.  In  order  to  find  all  pages  relevant 
to  “[Analysis  of  algorithms]  and  [(External  sorting)  or  (External  searching)]”, 
we  would  need  to  turn  many  pages.  On  the  other  hand,  the  same  query  can  be 
resolved  by  looking  at  only  two  pages  of  the  real  index  as  it  actually  appears, 
doing  simple  operations  on  the  inverted  lists  in  order  to  find  the  small  subset  of 
pages  that  satisfy  the  query. 

When  an  inverted  list  is  represented  as  a bit  string,  Boolean  combina- 
tions of  simple  queries  are,  of  course,  easily  performed,  because  computers  can 
manipulate  bit  strings  at  relatively  high  speed.  For  mixed  queries  in  which 
some  attributes  are  represented  as  sequential  lists  of  record  numbers  while  other 
attributes  are  represented  as  bit  strings,  it  is  not  difficult  to  convert  the  sequential 
lists  into  bit  strings,  then  to  perform  the  Boolean  operations  on  these  bit  strings. 

A quantitative  example  of  a hypothetical  application  may  be  helpful  at  this 
point.  Assume  that  we  have  1,000,000  records  of  40  characters  each,  and  that 
our  file  is  stored  on  MIXTEC  disks,  as  described  in  Section  5.4.9.  The  file  itself 
therefore  fills  two  disk  units,  and  the  inverted  lists  will  probably  fill  several 
more.  Each  track  contains  5000  characters  = 30,000  bits,  so  an  inverted  list 
for  a particular  attribute  will  take  up  at  most  34  tracks.  (This  maximum 
number  of  tracks  occurs  when  the  bitstring  representation  is  the  shortest  possible 
one.)  Suppose  that  we  have  a rather  involved  query  that  refers  to  a Boolean 
combination  of  10  inverted  lists;  in  the  worst  case  we  will  have  to  read  340  tracks 
of  information  from  the  inverted  file,  for  a total  read  time  of  340  x 25  ms  = 8.5  sec. 
The  average  latency  delay  will  be  about  one  half  of  the  read  time,  but  by  careful 
programming  we  may  be  able  to  eliminate  the  latency.  By  storing  the  first  track 
of  each  bitstring  list  in  one  cylinder,  and  the  second  track  of  each  list  in  the  next, 
etc.,  most  of  the  seek  time  will  be  eliminated,  so  we  can  estimate  the  maximum 
seek  time  as  about  34  x 26  ms  t « 0.9  sec  (or  twice  this  if  two  independent  disk 
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units  are  involved).  Finally,  if  q records  satisfy  the  query,  we  will  need  about 
q x (60  ms  (seek)  + 12.5  ms  (latency)  + 0.2  ms  (read))  extra  time  to  fetch  each 
one  for  subsequent  processing.  Thus  an  optimistic  estimate  of  the  total  expected 
time  to  process  this  rather  complicated  query  is  roughly  (10  + ,073q)  seconds. 
This  may  be  contrasted  with  about  210  seconds  to  read  through  the  entire  file 
at  top  speed  under  the  same  assumptions  without  using  any  inverted  lists. 

This  example  shows  that  space  optimization  is  closely  related  to  time  opti- 
mization in  a disk  memory;  the  time  to  process  the  inverted  lists  is  roughly  the 
time  needed  to  seek  and  to  read  them. 

The  discussion  above  has  more  or  less  assumed  that  the  file  is  not  growing 
or  shrinking  as  we  query  it;  what  should  we  do  if  frequent  updates  are  necessary? 
In  many  applications  it  is  sufficient  to  batch  a number  of  requests  for  updates, 
and  to  take  care  of  them  in  dull  moments  when  no  queries  need  to  be  answered. 
Alternatively,  if  updating  the  file  has  high  priority,  the  method  of  B-trees  (Sec- 
tion 6.2.4)  is  attractive.  The  entire  collection  of  inverted  lists  could  be  made 
into  one  huge  B-tree,  with  special  conventions  for  the  leaves  so  that  the  branch 
nodes  contain  key  values  while  the  leaves  contain  both  keys  and  lists  of  pointers 
of  records.  File  updates  can  also  be  handled  by  other  methods  that  we  shall 
discuss  below. 

Geometric  data.  A great  many  applications  deal  with  points,  lines,  and  shapes 
in  spaces  of  two  or  more  dimensions.  One  of  the  first  approaches  to  distance- 
oriented  queries  was  the  “post-office  tree”  proposed  in  1972  by  Bruce  McNutt. 
Suppose,  for  example,  that  we  wish  to  handle  queries  like  “What  is  the  nearest 
city  to  point  x?” , given  the  value  of  x.  Each  node  of  McNutt’s  tree  corresponds 
to  a city  y and  a “test  radius”  r;  the  left  subtree  of  this  node  corresponds  to 
all  cities  z entered  subsequently  into  this  part  of  the  tree  such  that  the  distance 
from  y to  z is  < r + S,  and  the  right  subtree  similarly  is  for  distances  > r — 5. 
Here  ^ is  a given  tolerance;  cities  between  r — 5 and  r + 5 away  from  y must  be 
entered  in  both  subtrees.  Searching  in  such  a tree  makes  it  possible  to  locate  all 
cities  within  distance  <5  of  a given  point.  (See  Fig.  45.) 


Fig.  45.  The  top  levels  of  an  example  “post-office  tree.”  To  search  for  all  cities  near 
a given  point  x,  start  at  the  root:  If  x is  within  1800  miles  of  Las  Vegas,  go  left, 
otherwise  go  to  the  right;  then  repeat  the  process  until  encountering  a terminal  node. 
The  method  of  tree  construction  ensures  that  all  cities  within  20  miles  of  x will  be 
encountered  during  this  search. 
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Several  experiments  based  on  this  idea  were  conducted  by  McNutt  and 
Edward  Pring,  using  the  231  most  populous  cities  in  the  continental  United 
States  in  random  order  as  an  example  database.  They  let  the  test  radii  shrink  in 
a regular  manner,  replacing  r by  0.67r  when  going  to  the  left,  and  by  0.57r  when 
going  to  the  right,  except  that  r was  left  unchanged  when  taking  the  second  of 
two  consecutive  right  branches.  The  result  was  that  610  nodes  were  required  in 
the  tree  for  5 = 20  miles,  and  1600  nodes  were  required  for  S = 35  miles.  The 
top  levels  of  their  smaller  tree  are  shown  in  Fig.  45.  (In  the  remaining  levels  of 
this  tree,  Orlando  FL  appeared  below  both  Jacksonville  and  Miami.  Some  cities 
occurred  quite  often;  for  example,  17  of  the  nodes  were  for  Brockton  MA!) 

The  rapid  file  growth  as  S increases  indicates  that  post-office  trees  probably 
have  limited  utility.  We  can  do  better  by  working  directly  with  the  coordinates 
of  each  point,  regarding  the  coordinates  as  attributes  or  secondary  keys;  then 
we  can  make  Boolean  queries  based  on  ranges  of  the  keys.  For  example,  suppose 
that  the  records  of  the  file  refer  to  North  American  cities,  and  that  the  query 
asks  for  all  cities  with 

(21.49°  < LATITUDE  < 37.41°)  AND  (70.34°  < LONGITUDE  < 75.72°). 

Reference  to  a map  will  show  that  many  cities  satisfy  this  LATITUDE  range,  and 
many  satisfy  the  LONGITUDE  range,  but  hardly  any  cities  lie  in  both  ranges.  One 
approach  to  such  orthogonal  range  queries  is  to  partition  the  set  of  all  possible 
LATITUDE  and  LONGITUDE  values  rather  coarsely,  with  only  a few  classes  per 
attribute  (for  example,  by  truncating  to  the  next  lower  multiple  of  5°),  then  to 
have  one  inverted  list  for  each  combined  (LATITUDE,  LONGITUDE)  class.  This  is 
like  having  maps  with  one  page  for  each  local  region.  Using  5°  intervals,  the  query 
above  would  refer  to  eight  pages,  namely  (20°,  70°),  (25°,  70°),  ...,  (35°,  75°). 
The  range  query  needs  to  be  processed  for  each  of  these  pages,  either  by  going  to 
a finer  partition  within  the  page  or  by  direct  reference  to  the  records  themselves, 
depending  on  the  number  of  records  corresponding  to  that  page.  In  a sense  this 
is  a tree  structure  with  two-dimensional  branching  at  each  internal  node. 

A substantial  elaboration  of  this  approach,  called  a grid  file , was  developed 
by  J.  Nievergelt,  H.  Hinterberger,  and  K.  C.  Sevcik  [ACM  Trans.  Database 
Systems  9 (1984),  38-71],  If  each  point  x has  k coordinates  (sq, . . . , aq),  they 
divide  the  ith  coordinate  values  into  ranges 

-oo  = 9io  < 9n  < ■ ■ ■ < gin  = +oo  (l) 

and  locate  x by  determining  indices  (j i,  ■ ■ ■ ,jk)  such  that 

0 < ji  < ri:  gijx  <Xi<  gi(ji+i)  for  1 < i < k.  (2) 

All  points  that  have  a given  value  of  (ji,  • • • ,jk)  are  called  cells.  Records  for 
points  in  the  same  cell  are  stored  in  the  same  bucket  in  an  external  memory. 
Buckets  are  also  allowed  to  contain  points  from  several  adjacent  cells,  provided 
that  each  bucket  corresponds  to  a fc-dimensional  rectangular  region  or  “super- 
cell.” Various  strategies  for  updating  the  grid  boundary  values  gij  and  for 
splitting  or  combining  buckets  are  possible;  see,  for  example,  K.  Hinrichs,  BIT  25 
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(1985),  569-592.  The  characteristics  of  grid  files  with  random  data  have  been 
analyzed  by  M.  Regnier,  BIT  25  (1985),  335-357;  P.  Flajolet  and  C.  Puech, 
JACM  33  (1986),  371-407,  §4.2. 

A simpler  way  to  deal  with  orthogonal  range  queries  was  introduced  by  J.  L. 
Bentley  and  R.  A.  Finkel,  using  structures  called  quadtrees  [Acta  Informatica  4 
(1974),  1-9].  In  the  two-dimensional  case  of  their  construction,  every  node  of 
such  a tree  represents  a rectangle  and  also  contains  one  of  the  points  in  that 
rectangle;  there  are  four  subtrees,  corresponding  to  the  four  quadrants  of  the 
original  rectangle  relative  to  the  coordinates  of  the  given  point.  Similarly,  in 
three  dimensions  there  is  eight- way  branching,  and  the  trees  are  sometimes  called 
octrees.  A fc-dimensional  quadtree  has  2fc-way  branching. 

The  mathematical  analysis  of  random  quadtrees  is  quite  difficult,  but  in 
1988  the  asymptotic  form  of  the  expected  insertion  time  for  the  7V-th  node  in  a 
random  fc-dimensional  quadtree  was  determined  to  be 

^-InN  + 0(1),  (3) 

by  two  groups  of  researchers  working  independently:  See  L.  Devroye  and  L.  La- 
forest,  SICOMP  19  (1990),  821-832;  P.  Flajolet,  G.  Gonnet,  C.  Puech,  and 
J.  M.  Robson,  Algorithmica  10  (1993),  473-500.  Notice  that  when  k — 1,  this 
result  agrees  with  the  well-known  formula  for  insertion  into  a binary  search  tree, 
Eq.  6.2.2-(5).  Further  work  by  P.  Flajolet,  G.  Labelle,  L.  Laforest,  and  B.  Salvy 
showed  in  fact  that  the  average  internal  path  length  can  be  expressed  in  the 
surprisingly  elegant  form 


SO*  nK). 

i> 2 1=3  j 


(4) 


and  further  analysis  of  random  quadtrees  was  therefore  possible  with  the  help 
of  hypergeometric  functions  [see  Random  Structures  &c  Algorithms  7 (1995), 
117-144]. 

Bentley  went  on  to  simplify  the  quadtree  representation  even  further  by 
introducing  “fc-d  trees,”  which  have  only  two-way  branching  at  each  node  [CACM 
18  (1975),  509-517;  IEEE  Transactions  SE-5  (1979),  333-340].  A 1-d  tree  is 
just  an  ordinary  binary  search  tree,  as  in  Section  6.2.2;  a 2-d  tree  is  similar, 
but  the  nodes  on  even  levels  compare  ^-coordinates  and  the  nodes  on  odd  levels 
compare  y-coordinates  when  branching.  In  general,  a A;-d  tree  has  nodes  with 
k coordinates,  and  the  branching  on  each  level  is  based  on  only  one  of  the 
coordinates;  for  example,  we  might  branch  on  coordinate  number  (I  mod  k)  + 1 
on  level  l.  A tie-breaking  rule  based  on  a record’s  serial  number  or  location 
in  memory  can  be  used  to  ensure  that  no  two  records  agree  in  any  coordinate 
position.  Randomly  grown  k-d  trees  turn  out  to  have  exactly  the  same  average 
path  length  and  shape  distribution  as  ordinary  binary  search  trees,  because  the 
assumptions  underlying  their  growth  are  the  same  as  in  the  one-dimensional  case 
(see  exercise  6. 2. 2-6). 
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If  the  file  is  not  changing  dynamically,  we  can  balance  any  AT-node  k-d  tree 
so  that  its  height  is  « lg  N,  by  choosing  a median  value  for  branching  at  each 
node.  Then  we  can  be  sure  that  several  fundamental  types  of  queries  will  be 
handled  efficiently.  For  example,  Bentley  proved  that  we  can  identify  all  records 
that  have  t specified  coordinates  in  0(N1~t/k)  steps.  We  can  also  find  all  records 
that  lie  in  a given  rectangular  region  in  at  most  0(fIV1~1/fc  +q)  steps,  if  t of  the 
coordinates  are  restricted  to  subranges  and  there  are  q such  records  altogether 
[D.  T.  Lee  and  C.  K.  Wong,  Acta  Informatica  23  (1977),  23-29].  In  fact,  if  the 
given  region  is  nearly  cubical  and  q is  small,  and  if  the  coordinate  chosen  for 
branching  at  each  node  has  the  greatest  spread  of  attribute  values,  Friedman, 
Bentley,  and  Finkel  [ACM  Trans.  Math.  Software  3 (1977),  209-226]  showed 
that  the  average  time  for  such  a region  query  will  be  only  0(logN  + q).  The 
same  formula  applies  when  searching  such  k-d  trees  for  the  nearest  neighbor  of 
a given  point  in  fc-dimensional  space. 

When  k-d  trees  are  random  instead  of  perfectly  balanced,  the  average  run- 
ning time  for  partial  matches  of  t specified  coordinates  increases  slightly  to 
©(TV i-t/fc+/(t/fc)).  here  the  function  / is  defined  implicitly  by  the  equation 

(/(x)  + 3 — x)x  (f(x)  + 2 — a:)1_x  = 2,  (5) 

and  it  is  quite  small:  We  have 

0 < f(x)  < 0.06329  33881  23738  85718  14011  27797  33590  58170-,  (6) 

and  the  maximum  occurs  when  x is  near  0.585.  [See  P.  Flajolet  and  C.  Puech, 
JACM  33  (1986),  371-407,  §3.] 

Because  of  the  aesthetic  appeal  and  great  significance  of  geometric  algo- 
rithms, there  has  been  an  enormous  growth  in  techniques  for  solving  higher- 
dimensional search  problems  and  related  questions  of  many  kinds.  Indeed,  a 
new  subfield  of  mathematics  and  computer  science  called  Computational  Ge- 
ometry has  developed  rapidly  since  the  1970s.  The  Handbook  of  Discrete  and 
Computational  Geometry,  edited  by  J.  E.  Goodman  and  J.  O’Rourke  (Boca 
Raton,  Florida:  CRC  Press,  1997),  is  an  excellent  reference  to  the  state  of  the 
art  in  that  field  as  of  1997. 

A comprehensive  survey  of  data  structures  and  algorithms  for  the  important 
special  cases  of  two-  and  three-dimensional  objects  has  been  prepared  by  Hanan 
Samet  in  a pair  of  complementary  books,  The  Design  and  Analysis  of  Spatial 
Data  Structures  and  Applications  of  Spatial  Data  Structures  (Addison-Wesley, 
1990).  Samet  points  out  that  the  original  quadtrees  of  Bentley  and  Finkel  are 
now  more  properly  called  “point  quadtrees”;  the  name  “quadtree”  itself  has 
become  a generic  term  for  any  hierarchical  decomposition  of  geometric  data. 

Compound  attributes.  It  is  possible  to  combine  two  or  more  attributes  into 
one  super-attribute.  For  example,  a (CLASS,  MAJOR)  attribute  could  be  created 
by  combining  the  CLASS  and  MAJOR  fields  of  a university  enrollment  file.  In  this 
way  queries  can  often  be  satisfied  by  taking  the  union  of  disjoint,  short  lists 
instead  of  the  intersection  of  longer  lists. 
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The  idea  of  attribute  combination  was  developed  further  by  V.  Y.  Lum 
[CACM  13  (1970),  660-665],  who  suggested  ordering  the  inverted  lists  of  com- 
bined attributes  lexicographically  from  left  to  right,  and  making  multiple  copies, 
with  the  individual  attributes  permuted  in  a clever  way.  For  example,  suppose 
that  we  have  three  attributes  A,  B,  and  C;  we  can  form  three  compound  attributes 

(A,  B,  C),  (B,  C,  A),  (C,  A,  B)  (7) 

and  construct  ordered  inverted  lists  for  each  of  these.  (Thus  in  the  first  list,  the 
records  occur  in  order  of  their  A values,  with  all  records  of  the  same  A value  in 
order  by  B and  then  by  C.)  This  organization  makes  it  possible  to  satisfy  queries 
based  on  any  combination  of  the  three  attributes;  for  example,  all  records  having 
specified  values  for  A and  C will  appear  consecutively  in  the  third  list. 

Similarly,  from  four  attributes  A,  B,  C,  D,  we  can  form  the  six  combined 
attributes 

(A,  B,  C,  D),  (B,  C,  D,  A),  (B,  D,  A,  C),  (C,A,D,B),  (C,D,A,B),  (D,A,B,C),  (8) 

which  suffice  to  answer  all  combinations  of  simple  queries  relating  to  the  simul- 
taneous values  of  one,  two,  three,  or  four  of  the  attributes.  There  is  a general 
procedure  for  constructing  (£)  combined  attributes  from  n attributes,  where 
k < |n,  such  that  all  records  having  specified  combinations  of  at  most  k or 
at  least  n — k of  the  attribute  values  will  appear  consecutively  in  one  of  the 
combined  attribute  lists  (see  exercise  1).  Alternatively,  we  can  get  by  with 
fewer  combinations  when  some  attributes  have  a limited  number  of  values.  For 
example,  if  D is  simply  a two-valued  attribute,  the  three  combinations 

(D,  A,  B,  C),  (D,  B,  C,  A),  (D,C,A,B)  (9) 

obtained  by  placing  D in  front  of  (7)  will  be  almost  as  good  as  (8)  with  only  half 
the  redundancy,  since  queries  that  do  not  depend  on  D can  be  treated  by  looking 
in  just  two  places  in  one  of  the  lists. 

Binary  attributes.  It  is  instructive  to  consider  the  special  case  in  which  all 
attributes  are  two- valued.  In  a sense  this  is  the  opposite  of  combining  attributes, 
since  we  can  represent  any  value  as  a binary  number  and  regard  the  individual 
bits  of  that  number  as  separate  attributes.  Table  1 shows  a typical  file  involving 
“yes-no”  attributes;  in  this  case  the  records  stand  for  selected  cookie  recipes, 
and  the  attributes  specify  which  ingredients  are  used.  For  example,  Almond 
Lace  Wafers  are  made  from  butter,  flour,  milk,  nuts,  and  granulated  sugar.  If 
we  think  of  Table  1 as  a matrix  of  zeros  and  ones,  the  transpose  of  the  matrix  is 
the  inverted  file,  in  bitstring  form. 

The  right-hand  column  of  Table  1 is  used  to  indicate  special  items  that  occur 
only  rarely.  These  can  be  coded  in  a more  efficient  way  than  to  devote  an  entire 
column  to  each  one;  and  the  “Cornstarch”  column  could  be  treated  similarly. 
Dually,  we  could  find  a more  efficient  way  to  encode  the  “Flour”  column,  since 
flour  occurs  in  everything  except  Meringues.  For  the  present,  however,  let  us 
sidestep  these  considerations  and  simply  ignore  the  “Special  ingredients”  column. 
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A FILE  WITH  BINARY  ATTRIBUTES 
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Let  us  define  a basic  query  in  a binary  attribute  file  as  a request  for  all  records 
having  0’s  in  certain  columns,  l’s  in  other  columns,  and  arbitrary  values  in  the 
remaining  columns.  Using  to  stand  for  an  arbitrary  value,  we  can  represent 
any  basic  query  as  a sequence  of  0’s,  l’s,  and  *’s.  For  example,  consider  a man 
who  is  in  the  mood  for  some  coconut  cookies,  but  he  is  allergic  to  chocolate, 
hates  anise,  and  has  run  out  of  vanilla  extract;  he  can  formulate  the  query 

*0****0**1*******************0.  (10) 

Table  1 now  says  that  Delicious  Prune  Bars  are  just  the  thing. 

Before  we  consider  the  general  problem  of  organizing  a file  for  basic  queries, 
it  is  important  to  look  at  the  special  case  where  no  0’s  are  specified,  only  l’s 
and  *’s.  This  may  be  called  an  inclusive  query , because  it  asks  for  all  records 
that  include  a certain  set  of  attributes,  if  we  assume  that  l’s  denote  attributes 
that  are  present  and  0’s  denote  attributes  that  are  absent.  For  example,  the 
recipes  in  Table  1 that  call  for  both  baking  powder  and  baking  soda  are  Glazed 
Gingersnaps  and  Old-Fashioned  Sugar  Cookies. 

In  some  applications  it  is  sufficient  to  provide  for  the  special  case  of  inclusive 
queries.  This  occurs,  for  example,  in  the  case  of  many  manual  card-filing  systems, 
such  as  “edge-notched  cards”  or  “feature  cards.”  An  edge-notched  card  system 
corresponding  to  Table  1 would  have  one  card  for  every  recipe,  with  holes  cut 
out  for  each  ingredient  (see  Fig.  46).  In  order  to  process  an  inclusive  query,  the 
file  of  cards  is  arranged  into  a neat  deck  and  needles  are  put  in  each  column 
position  corresponding  to  an  attribute  that  is  to  be  included.  After  raising  the 
needles,  all  cards  having  the  appropriate  attributes  will  drop  out. 
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SWISS-CINNAMON  CRISPS 


Fig.  46.  An  edge-notched  card. 


A feature-card  system  works  on  the  inverse  file  in  a similar  way.  In  this 
case  there  is  one  card  for  every  attribute,  and  holes  are  punched  in  designated 
positions  on  the  surface  of  the  card  for  every  record  possessing  that  attribute. 
An  ordinary  80-column  card  can  therefore  be  used  to  tell  which  of  12  x 80  = 960 
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records  have  a given  attribute.  To  process  an  inclusive  query,  the  feature  cards 
for  the  specified  attributes  are  selected  and  put  together;  then  light  will  shine 
through  all  positions  corresponding  to  the  desired  records.  This  operation  is 
analogous  to  the  treatment  of  Boolean  queries  by  intersecting  inverted  bit  strings 
as  explained  above. 


Table  2 

AN  EXAMPLE  OF  SUPERIMPOSED  CODING 


Codes  for  individual  flavorings 

Almond  extract 

0100000001 

Dates 

1000000100 

Allspice 

0000100001 

Ginger 

0000110000 

Anise  seed 

0000011000 

Honey 

0000000011 

Applesauce 

0010010000 

Lemon  juice 

1000100000 

Apricots 

1000010000 

Lemon  peel 

0011000000 

Bananas 

0000100010 

Mace 

0000010100 

Candied  cherries 

0000101000 

Molasses 

1001000000 

Cardamom 

1000000001 

Nutmeg 

0000010010 

Chocolate 

0010001000 

Nuts 

0000100100 

Cinnamon 

1000000010 

Oranges 

0100000100 

Citron 

0100000010 

Peanut  butter 

0000000101 

Cloves 

0001100000 

Pepper 

0010000100 

Coconut 

0001010000 

Prunes 

0010000010 

Coffee 

0001000100 

Raisins 

0101000000 

Currant  jelly 

0010000001 

Vanilla  extract 

0000001001 

Superimposed  codes 

Almond  Lace  Wafers 

0000100100 

Lebkuchen  Rounds 

1011110111 

Applesauce-Spice  Squares 

1111111111 

Meringues 

1000101100 

Banana-Oatmeal  Cookies 

1000111111 

Moravian  Spice  Cookies 

1001110011 

Chocolate  Chip  Cookies 

0010101101 

Oatmeal-Date  Bars 

1000100100 

Coconut  Macaroons 

0001111101 

Old-Fashioned  Sugar 

Cookies 

0000011011 

Cream-Cheese  Cookies 

0010001001 

Peanut-Butter  Pinwheels 

0010001101 

Delicious  Prune  Bars 

0111110110 

Petticoat  Tails 

0000001001 

Double-Chocolate  Drops 

0010101100 

Pfeffernuesse 

1111111111 

Dream  Bars 

0001111101 

Scotch  Oatmeal  Shortbread 

0000001001 

Filled  Turnovers 

1011101101 

Shortbread  Stars 

0000000000 

Finska  Kakor 

0100100101 

Springerle 

0011011000 

Glazed  Gingersnaps 

1001110010 

Spritz  Cookies 

0000001001 

Hermits 

1101010110 

Swedish  Kringler 

0000000000 

Jewel  Cookies 

0010101101 

Swiss-Cinnamon  Crisps 

1000000010 

Jumbles 

1000001011 

Toffee  Bars 

0010101101 

Kris  Kringles 

1011100101 

Vanilla-Nut  Icebox  Cookies 

0000101101 

Superimposed  coding.  The  reason  these  manual  card  systems  are  of  special 
interest  to  us  is  that  ingenious  schemes  have  been  devised  to  save  space  on 
edge-notched  cards;  the  same  principles  can  be  applied  in  the  representation  of 
computer  files.  Superimposed  coding  is  a technique  similar  to  hashing,  and  it  was 
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actually  invented  several  years  before  hashing  itself  was  discovered.  The  idea  is  to 
map  attributes  into  random  fc-bit  codes  in  an  n-bit  field,  and  to  superimpose  the 
codes  for  each  attribute  that  is  present  in  a record.  An  inclusive  query  for  some 
set  of  attributes  can  be  converted  into  an  inclusive  query  for  the  corresponding 
superimposed  bit  codes.  A few  extra  records  may  satisfy  this  query,  but  the 
number  of  such  “false  drops”  can  be  statistically  controlled.  [See  Calvin  N. 
Mooers,  Amer.  Chem.  Soc.  Meeting  112  (September  1947),  14E-15E;  American 
Documentation  2 (1951),  20-32.] 

As  an  example  of  superimposed  coding,  let’s  consider  Table  1 again,  but  only 
the  flavorings  instead  of  the  basic  ingredients  like  baking  powder,  shortening, 
eggs,  and  flour.  Table  2 shows  what  happens  if  we  assign  random  2-bit  codes  in 
a 10-bit  field  to  each  of  the  flavoring  attributes  and  superimpose  the  coding.  For 
example,  the  entry  for  Chocolate  Chip  Cookies  is  obtained  by  superimposing  the 
codes  for  chocolate,  nuts,  and  vanilla: 

0010001000  | 0000100100  | 0000001001  = 0010101101. 

The  superimposition  of  these  codes  also  yields  some  spurious  attributes,  in  this 
case  allspice,  candied  cherries,  currant  jelly,  peanut  butter,  and  pepper;  these 
will  cause  false  drops  to  occur  on  certain  queries  (and  they  also  suggest  the 
creation  of  a new  recipe  called  False  Drop  Cookies!). 

Superimposed  coding  actually  doesn’t  work  very  well  in  Table  2,  because 
that  table  is  a small  example  with  lots  of  attributes  present.  In  fact,  Applesauce- 
Spice  Squares  will  drop  out  for  every  query,  since  it  was  obtained  by  superim- 
posing seven  codes  that  cover  all  ten  positions;  and  Pfeffernuesse  is  even  worse, 
obtained  by  superimposing  twelve  codes.  On  the  other  hand  Table  2 works 
surprisingly  well  in  some  respects;  for  example,  if  we  try  the  query  “Vanilla 
extract” , only  the  record  for  Pfeffernuesse  comes  out  as  a false  drop. 

A more  appropriate  example  of  superimposed  coding  occurs  if  we  have,  say, 
a 32-bit  field  and  a set  of  (332)  = 4960  different  attributes,  where  each  record  is 
allowed  to  possess  up  to  six  attributes  and  each  attribute  is  encoded  by  specifying 
3 of  the  32  bits.  In  this  situation,  if  we  assume  that  each  record  has  six  randomly 
selected  attributes,  the  probability  of  a false  drop  in  an  inclusive  query 


on  one  attribute  is 

.07948358 

on  two  attributes  is 

.00708659 

on  three  attributes  is 

.00067094 

on  four  attributes  is 

.00006786 

on  five  attributes  is 

.00000728 

on  six  attributes  is 

.00000082 

Thus  if  there  are  M records  that  do  not  actually  satisfy  a two-attribute  query, 
about  .007 M will  have  a superimposed  code  that  spuriously  matches  all  code  bits 
of  the  two  specified  attributes.  (These  probabilities  are  computed  in  exercise  4.) 
The  total  number  of  bits  needed  in  the  inverted  file  is  only  32  times  the  number 
of  records,  which  is  less  than  half  the  number  of  bits  needed  to  specify  the 
attributes  themselves  in  the  original  file. 
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If  carefully  selected  nonrandom  codes  are  used,  it  is  possible  to  avoid  false 
drops  entirely  in  superimposed  coding,  as  shown  by  W.  H.  Kautz  and  R.  C. 
Singleton,  IEEE  Trans.  IT-10  (1964),  363-377;  one  of  their  constructions  ap- 
pears in  exercise  16. 

Malcolm  C.  Harrison  [CACM  14  (1971),  777-779]  has  observed  that  super- 
imposed coding  can  be  used  to  speed  up  text  searching.  Assume  that  we  want 
to  locate  all  occurrences  of  a particular  string  of  characters  in  a long  body  of 
text,  without  building  an  extensive  table  as  in  Algorithm  6.3P;  and  assume,  for 
example,  that  the  text  is  divided  into  individual  lines  C1C2  . . . C50  of  50  characters 
each.  Harrison  suggests  encoding  each  of  the  49  pairs  C1C2,  C2C3,  . . . , C49C50  by 
hashing  each  of  them  into  a number  between  0 and  127,  say;  then  the  “signature” 
of  the  line  C1C2  . . . c5o  is  the  string  of  128  bits  bobi  . . . 6127,  where  bi  = 1 if  and 
only  if  h(cjCj+i)  = i for  some  j. 

If  now  we  want  to  search  for  all  occurrences  of  the  word  NEEDLE  in  a large 
text  file  called  HAYSTACK,  we  simply  look  for  all  lines  whose  signature  contains 
1-bits  in  positions  h( NE),  /i(EE),  h(ED),  h(DL),  and  h(LE).  Assuming  that  the 
hash  function  is  random,  the  probability  that  a random  line  contains  all  these 
bits  in  its  signature  is  only  0.00341  (see  exercise  4);  hence  the  intersection  of 
five  inverted-list  bit  strings  will  rapidly  identify  all  the  lines  containing  NEEDLE, 
together  with  a few  false  drops. 

The  assumption  of  randomness  is  not  really  justified  in  this  application, 
since  typical  text  has  so  much  redundancy;  the  distribution  of  adjacent  letter 
pairs  in  English  words  is  highly  biased.  For  example,  it  will  probably  be  very 
helpful  to  discard  all  pairs  CjCj+i  containing  a blank  character,  since  blanks  are 
usually  much  more  common  than  any  other  symbol. 

Another  interesting  application  of  superimposed  coding  to  search  problems 
has  been  suggested  by  Burton  H.  Bloom  [CACM  13  (1970),  422-426];  his  method 
actually  applies  to  primary  key  retrieval,  although  it  is  most  appropriate  for  us 
to  discuss  it  in  this  section.  Imagine  a search  application  with  a large  database 
in  which  no  calculation  needs  to  be  done  if  the  search  was  unsuccessful.  For 
example,  we  might  want  to  check  somebody’s  credit  rating  or  passport  number, 
and  if  no  record  for  that  person  appears  in  the  file  we  don’t  have  to  investigate 
further.  Similarly  in  an  application  to  computerized  typesetting,  we  might  have 
a simple  algorithm  that  hyphenates  most  words  correctly,  but  it  fails  on  some 
50,000  exceptional  words;  if  we  don’t  find  the  word  in  the  exception  file  we  are 
free  to  use  the  simple  algorithm. 

In  such  situations  it  is  possible  to  maintain  a bit  table  in  internal  memory 
so  that  most  keys  not  in  the  file  can  be  recognized  as  absent  without  making 
any  references  to  the  external  memory.  Here’s  how:  Let  the  internal  bit  table 
be  6061 ...  6m- i,  where  M is  rather  large.  For  each  key  Kj  in  the  file,  compute 
k independent  hash  functions  hi(Kj), . . . , hk(Kj),  and  set  the  corresponding  k 
b's  equal  to  1.  (These  k values  need  not  be  distinct.)  Thus  bi  = 1 if  and  only 
if  hi(Kj)  = i for  some  j and  l.  Now  to  determine  if  a search  argument  K is 
in  the  external  file,  first  test  whether  or  not  bhl(K)  = 1 for  1 < l < km,  if  not, 
there  is  no  need  to  access  the  external  memory,  but  if  so,  a conventional  search 
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will  probably  find  K if  k and  M have  been  chosen  properly.  The  chance  of  a 
false  drop  when  there  are  N records  in  the  file  is  approximately  (1  — e-fciv/M)fc. 
In  a sense,  Bloom’s  method  treats  the  entire  file  as  one  record,  with  the  primary 
keys  as  the  attributes  that  are  present,  and  with  superimposed  coding  in  a huge 
M- bit  field. 

Still  another  variation  of  superimposed  coding  has  been  developed  by  Rich- 
ard A.  Gustafson  [Ph.D.  thesis  (Univ.  South  Carolina,  1969)].  Suppose  that 
we  have  N records  and  that  each  record  possesses  six  attributes  chosen  from 
a set  of  10,000  possibilities.  The  records  may,  for  example,  stand  for  technical 
articles  and  the  attributes  may  be  keywords  describing  the  article.  Let  h be  a 
hash  function  that  maps  each  attribute  into  a number  between  0 and  15.  If  a 
record  has  attributes  oi,  02, . . . , ae,  Gustafson  suggests  mapping  the  record  into 
the  16-bit  number  b0bi . . . &15,  where  6,  = 1 if  and  only  if  h{a3)  = i for  some  j; 
and  furthermore  if  this  method  results  in  only  k of  the  b’s  equal  to  1,  for  k < 6, 
another  6 — k Is  are  supplied  by  some  random  method  (not  necessarily  depending 
on  the  record  itself).  There  are  (g6)  = 8008  sixteen-bit  codes  in  which  exactly 
six  1-bits  are  present,  and  with  luck  about  N/8008  records  will  be  mapped  into 
each  value.  We  can  keep  8008  lists  of  records,  directly  calculating  the  address 
corresponding  to  b0bi . . . 615  using  a suitable  formula.  In  fact,  if  the  Is  occur  in 
positions  0 < pi  < p2  < ■ ■ ■ < pe,  the  function 

(?)  + (P2)+‘  "+(P66) 

will  convert  each  string  60&1  • • • &15  into  a unique  number  between  0 and  8007, 
as  we  have  seen  in  exercises  1.2.6-56  and  2. 2. 6-7. 

Now  if  we  want  to  find  all  records  having  three  particular  attributes  Ai,  A2, 
A3,  we  compute  h(A{),  h(A2),  h{A^)\  assuming  that  these  three  values  are 
distinct,  we  need  only  look  at  the  records  stored  in  the  (g3)  = 286  lists  whose 
bit  code  6061  . . .615  contains  Is  in  those  three  positions.  In  other  words,  only 
286/8008  ss  3.5  percent  of  the  records  need  to  be  examined  in  the  search. 

See  the  article  by  C.  S.  Roberts,  Proc.  IEEE  67  (1979),  1624-1642,  for  an 
excellent  exposition  of  superimposed  coding,  together  with  an  application  to  a 
large  database  of  telephone-directory  listings.  An  application  to  spelling-check 
software  is  discussed  by  J.  K.  Mullin  and  D.  J.  Margoliash,  Software  Practice  & 
Exper.  20  (1990),  625-630. 

Combinatorial  hashing.  The  idea  underlying  Gustafson’s  method  just  de- 
scribed is  to  find  some  way  to  map  the  records  into  memory  locations  so  that 
comparatively  few  locations  are  relevant  to  a particular  query.  But  his  method 
applies  only  to  inclusive  queries  when  the  individual  records  possess  few  at- 
tributes. Another  type  of  mapping,  designed  to  handle  arbitrary  basic  queries 
like  (10)  consisting  of  0’s,  l’s,  and  *’s,  was  discovered  by  Ronald  L.  Rivest  in 
1971.  [See  SICOMP  5 (1976),  19-50.] 

Suppose  first  that  we  wish  to  construct  a crossword-puzzle  dictionary  for 
all  six-letter  words  of  English;  a typical  query  asks  for  all  words  of  the  form 
N**D*E,  say,  and  gets  the  reply  {NEEDLE,  NIDDLE,  NODDLE,  NOODLE,  NUDDLE}.  We 
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can  solve  this  problem  nicely  by  keeping  212  lists,  putting  the  word  NEEDLE  into 
list  number 

h( N)  h( E)  h(E)  h( D)  h(L)  h( E). 

Here  h is  a hash  function  taking  each  letter  into  a 2-bit  value,  and  we  get  a 12-bit 
list  address  by  putting  the  six  bit-pairs  together.  Then  the  query  N**D*E  can  be 
answered  by  looking  through  just  64  of  the  4096  lists. 

Similarly  let’s  suppose  that  we  have  1,000,000  records  each  containing  10 
secondary  keys,  where  each  secondary  key  has  a fairly  large  number  of  possible 
values.  We  can  map  the  records  whose  secondary  keys  are  {K\,  K2, . . . , Auo) 
into  the  20-bit  number 

h(Ki)h(K3)  •••  h(Kw),  (12) 

where  h is  a hash  function  taking  each  secondary  key  into  a 2-bit  value,  and 
(12)  stands  for  the  juxtaposition  of  these  ten  pairs  of  bits.  This  scheme  maps 

I, 000,000  records  into  220  = 1,048,576  possible  values,  and  we  can  consider  the 
total  mapping  as  a hash  function  with  M = 220;  chaining  can  be  used  to  resolve 
collisions.  If  we  want  to  retrieve  all  records  having  specified  values  of  any  five 
secondary  keys,  we  need  to  look  at  only  210  lists,  corresponding  to  the  five 
unspecified  bit  pairs  in  (12);  thus  only  about  1000  = \/N  records  need  to  be 
examined  on  the  average.  (A  similar  approach  was  suggested  by  M.  Arisawa, 

J.  Inf.  Proc.  Soc.  Japan  12  (1971),  163-167,  and  by  B.  Dwyer  (unpublished). 
Dwyer  suggested  using  a more  flexible  mapping  than  (12),  namely 

(hi(ATi)  + ^2(^2)  + • • • + h-io(ffio))  mod  M, 


where  M is  any  convenient  number,  and  the  hi  are  arbitrary  hash  functions 
possibly  of  the  form  W{K{  for  “random”  Wi.) 

Rivest  has  developed  this  idea  further  so  that  in  many  cases  we  have  the 
following  situation.  Assume  that  there  are  N « 2"  records,  each  having  m 
secondary  keys.  Each  record  is  mapped  into  an  n-bit  hash  address,  in  such  a 
way  that  a query  that  leaves  the  values  of  k keys  unspecified  corresponds  to 
approximately  Nk^m  hash  addresses.  All  the  other  methods  we  have  discussed 
in  this  section  (except  Gustafson’s)  require  order  N steps  for  retrieval,  although 
the  constant  of  proportionality  is  small;  for  large  enough  IV,  Rivest’s  method 
will  be  faster,  and  it  requires  no  inverted  files. 

But  we  have  to  define  an  appropriate  mapping  before  we  can  apply  this 
technique.  Here  is  an  example  with  small  parameters,  when  m = 4 and  n — 3 
and  when  all  secondary  keys  are  binary- valued;  we  can  map  4-bit  records  into 
eight  addresses  as  follows: 


*001->0 

0*00->l 
10*0  — ^ 2 
1 1 0 * -4  3 


*110-44 
1*1145 
0 1*1-46 
0 0 1 * -4-  7 


(13) 


An  examination  of  this  table  reveals  that  all  records  corresponding  to  the  query 
0 * * * are  mapped  into  locations  0,  1,  4,  6,  and  7;  and  similarly  any  basic 
query  with  three  *’s  corresponds  to  exactly  five  locations.  The  basic  queries 
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with  two  *’s  correspond  to  three  locations  each;  and  the  basic  queries  with  one  * 
correspond  to  either  one  or  two  locations,  (8  x 1 + 24  x 2)/32  = 1.75  on  the 
average.  Thus  we  have 


Number  of  unspecified 
bits  in  the  query 
4 
3 
2 
1 
0 


Number  of  locations 
to  search 
8 = 84/4 
5 « 83/4 
3 « 82/4 
1.75  « 81/4 
1 = 8°/4 


(i4) 


Of  course  this  is  such  a small  example,  we  could  handle  it  more  easily  by 
brute  force.  But  it  leads  to  nontrivial  applications,  since  we  can  use  it  also 
when  m — 4r  and  n — 3 r,  mapping  4r-bit  records  into  23r  « N locations  by 
dividing  the  secondary  keys  into  r groups  of  4 bits  each  and  applying  (13)  in  each 
group.  The  resulting  mapping  has  the  desired  property:  A query  that  leaves  k 
of  the  m bits  unspecified  will  correspond  to  approximately  Nk^m  locations.  (See 
exercise  6.) 

A.  E.  Brouwer  [SICOMP  28  (1999),  1970-1971]  has  found  an  attractive 
way  to  compress  8 bits  to  5,  with  a mapping  analogous  to  (13).  Every  8-bit  byte 
belongs  to  exactly  one  of  the  following  32  classes: 


0*000*0* 

01*0** 1 1 

00*11**1 

*11**101 

1*000*0* 

11*0**11 

10*11**1 

*11**010 

0*010*0* 

01*1**11 

00*0*01* 

*10*0*10 

1*010*0* 

11*1**11 

10*0*01* 

*10*1*01  , \ 

*0*1001*  l'15' 

0*10*1*0 

0* 1*000* 

*01*01*1 

1*10*1*0 

1*1*000* 

*10*10*0 

*0*0100* 

0*11*1*0 

0*0*11*0 

*00*011* 

*0*011*1 

1*11*1*0 

1*0*11*0 

*11*100* 

*0*110*0 

The  *’s  in  this  design  are 

arranged  in 

such  a way  that  there  are  3 in  each  row 

and  12  in  each  column.  Exercise  18  explains  how  to  obtain  similar  schemes  that 
will  compress  records  having,  say,  m — 4r  bits  into  addresses  having  n = 3r  bits. 
In  practice,  buckets  of  size  b would  be  used,  and  we  would  take  N « 2"6;  the 
case  6=1  has  been  used  in  the  discussion  above  for  simplicity  in  exposition. 

Rivest  has  also  suggested  another  simple  way  to  handle  basic  queries.  Sup- 
pose we  have,  say,  N m 210  records  of  30  bits  each,  where  we  wish  to  answer 
arbitrary  30-bit  basic  queries  like  (10).  Then  we  can  simply  divide  the  30  bits 
into  three  10-bit  fields,  and  keep  three  separate  hash  tables  of  size  M = 210.  Each 
record  is  stored  thrice,  in  lists  corresponding  to  its  bit  configurations  in  the  three 
fields.  Under  suitable  conditions,  each  list  will  contain  about  one  element.  Given 
a basic  query  with  k unspecified  bits,  at  least  one  of  the  fields  will  have  |_A / 3j  or 
fewer  bits  unspecified;  hence  we  need  to  look  in  at  most  2 Lfc/3J  ~ ]\[k/:Mi  Gf  the 
lists  to  find  all  answers  to  the  query.  Or  we  could  use  any  other  technique  for 
handling  basic  queries  in  the  selected  field. 
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Generalized  tries.  Rivest  went  on  to  suggest  yet  another  approach,  based 
on  a data  structure  like  the  tries  in  Section  6.3.  We  can  let  each  internal  node 
of  a generalized  binary  trie  specify  which  bit  of  the  record  it  represents.  For 
example,  in  the  data  of  Table  1 we  could  let  the  root  of  the  trie  represent  Vanilla 
extract;  then  the  left  subtrie  would  correspond  to  those  16  cookie  recipes  that 
omit  Vanilla  extract,  while  the  right  subtrie  would  be  for  the  16  that  use  it.  This 
16-16  split  nicely  bisects  the  file;  and  we  can  handle  each  subfile  in  a similar  way. 
When  a subfile  becomes  suitably  small,  we  represent  it  by  a terminal  node. 

To  process  a basic  query,  we  start  at  the  root  of  the  trie.  When  searching  a 
generalized  trie  whose  root  specifies  an  attribute  where  the  query  has  0 or  1,  we 
search  the  left  or  right  subtrie,  respectively;  and  if  the  query  has  * in  that  bit 
position,  we  search  both  subtries. 

Suppose  the  attributes  are  not  binary,  but  they  are  represented  in  binary 
notation.  We  can  build  a trie  by  looking  first  at  the  first  bit  of  attribute  1,  then 
the  first  bit  of  attribute  2,  . . . , the  first  bit  of  attribute  m,  then  the  second  bit  of 
attribute  1,  etc.  Such  a structure  is  called  an  “m-d  trie,”  by  analogy  with  m-d 
trees  (which  branch  by  comparisons  instead  of  by  bit  inspections).  P.  Flajolet 
and  C.  Puech  have  shown  that  the  average  time  to  answer  a partial  match  query 
in  a random  m-d  trie  of  N nodes  is  @(iVfe/m)  when  k/m  of  the  attributes  are 
unspecified  [JACM  33  (1986),  371-407,  §4.1];  the  variance  has  been  calculated 
by  W.  Schachinger,  Random  Structures  & Algorithms  7 (1995),  81-95. 

Similar  algorithms  can  be  developed  for  m-dimensional  versions  of  the  digital 
search  trees  and  Patricia  trees  of  Section  6.3.  These  structures,  which  tend  to  be 
slightly  better  balanced  than  m-d  tries,  have  been  analyzed  by  P.  Kirschenhofer 
and  H.  Prodinger,  Random  Structures  & Algorithms  5 (1994),  123-134. 

*Balanced  filing  schemes.  Another  combinatorial  approach  to  information 
retrieval,  based  on  balanced  incomplete  block  designs,  has  been  the  subject  of 
considerable  investigation.  Although  the  subject  is  quite  interesting  from  a 
mathematical  point  of  view,  it  has  unfortunately  not  yet  proved  to  be  more 
useful  than  the  other  methods  described  above.  A brief  introduction  to  the 
theory  will  be  presented  here  in  order  to  indicate  the  flavor  of  the  results,  in 
hopes  that  readers  might  think  of  good  ways  to  put  the  ideas  to  practical  use. 

A Steiner  triple  system  is  an  arrangement  of  v objects  into  unordered  triples 
in  such  a way  that  every  pair  of  objects  occurs  in  exactly  one  triple.  For  example, 
when  v = 7 there  is  essentially  only  one  Steiner  triple  system,  namely 

Triple  Pairs  included 

{1,2,4}  {1,2},  {1,4},  {2,4} 

{2,3,5}  {2,3},  {2,5},  {3,5} 

{3,4,6}  {3,4},  {3,6},  {4,6} 

{4,5,0}  {0,4},  {0,5},  {4,5} 

{5,6,1}  {1,5},  {1,6},  {5,6} 

{6,0,2}  {0,2},  {0,6},  {2,6} 

{0,1,3}  {0,1},  {0,3},  {1,3} 


(16) 
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Since  there  are  \v(v  - 1)  pairs  of  objects  and  three  pairs  per  triple,  there  must 
be  \v(v- 1)  triples  in  all;  and  since  each  object  must  be  paired  with  v- 1 others, 
each  object  must  appear  in  exactly  |(t>  — 1)  triples.  These  conditions  imply  that 
a Steiner  triple  system  can’t  exist  unless  ^v(v  — 1)  and  |(v  — 1)  are  integers,  and 
this  is  equivalent  to  saying  that  v is  odd  and  not  congruent  to  2 modulo  3;  thus 

v mod  6 = 1 or  3.  (17) 

Conversely,  T.  P.  Kirkman  proved  in  1847  that  Steiner  triple  systems  do  exist  for 
all  v > 1 such  that  (17)  holds.  His  interesting  construction  is  given  in  exercise  10. 

Steiner  triple  systems  can  be  used  to  reduce  the  redundancy  of  combined 
inverted  file  indexes.  For  example,  consider  again  the  cookie  recipe  file  of  Table  1, 
and  convert  the  rightmost  column  into  a 31st  attribute  that  is  1 if  any  special 
ingredients  are  necessary,  0 otherwise.  Assume  that  we  want  to  answer  all 
inclusive  queries  on  pairs  of  attributes,  such  as  “What  recipes  use  both  coconut 
and  raisins?”  We  could  make  up  an  inverted  list  for  each  of  the  (j1)  = 465 
possible  queries.  But  it  would  turn  out  that  this  takes  a lot  of  space  since 
Pfeffernuesse  (for  example)  would  appear  in  (IJ)  — 136  of  the  lists,  and  a record 
with  all  31  attributes  would  appear  in  every  list!  A Steiner  triple  system  can  be 
used  to  make  a slight  improvement  in  this  situation.  There  is  a Steiner  triple 
system  on  31  objects,  with  155  triples  and  each  pair  of  objects  occurring  in 
exactly  one  of  the  triples.  We  can  associate  four  lists  with  each  triple  {a,b,c}, 
one  list  for  all  records  having  attributes  a,  b,  c (that  is,  a and  b but  not  c); 
another  for  a,  b,  c;  another  for  a,  b,  c;  and  another  for  records  having  all  three 
attributes  a,  6,  c.  This  guarantees  that  no  record  will  be  included  in  more  than 
155  of  the  inverted  lists,  and  it  saves  space  whenever  a record  has  three  attributes 
that  correspond  to  a triple  of  the  system. 

Triple  systems  are  special  cases  of  block  designs  that  have  blocks  of  three  or 
more  objects.  For  example,  there  is  a way  to  arrange  31  objects  into  sextuples 
so  that  every  pair  of  objects  appears  in  exactly  one  sextuple: 

{0,4,16,21,22,24},  {1,5,17,22,23,25},  ...,  {30,3,15,20,21,23}  (18) 

(This  design  is  formed  from  the  first  block  by  addition  mod  31.  To  verify  that 
it  has  the  stated  property,  note  that  the  30  values  (at  — aj)  mod  31,  for  i j, 
are  distinct,  where  (01, 02, . . . , ae)  = (0,4,16,21,22,24).  To  find  the  sextuple 
containing  a pair  (x,y),  choose  i and  j such  that  rq  — a}  = x — y (modulo  31); 
now  if  k = ( x—di ) mod  31,  we  have  (<q+fc)  mod  31  = x and  ( a,j+k ) mod  31  = y.) 

We  can  use  the  design  above  to  store  the  inverted  lists  in  such  a way  that 
no  record  can  appear  more  than  31  times.  Each  sextuple  {a,b,  c,  d,  e,/}  is 
associated  with  57  lists,  for  the  various  possibilities  of  records  having  two  or 
more  of  the  attributes  a,  b,  c,  d,  e,  /,  namely  (a,b,c,d,e,  /),  (a,b,c,d,e,  /), 
...,  (a,  b,  c,  d,  e,  /);  and  the  answer  to  each  inclusive  2-attribute  query  is  the 
disjoint  union  of  16  appropriate  lists  in  the  appropriate  sextuple.  For  this  design, 
Pfeffernuesse  would  be  stored  in  29  of  the  31  blocks,  since  that  record  has  two 
of  the  six  attributes  in  all  but  blocks  {19, 23, 4, 9, 10, 12}  and  {13, 17, 29, 3, 4, 6} 
if  we  number  the  columns  from  0 to  30. 
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The  theory  of  block  designs  and  related  patterns  is  developed  in  detail  in 
Marshall  Hall,  Jr.’s  book  Combinatorial  Theory  (Waltham,  Mass.:  Blaisdell, 
1967).  Although  such  combinatorial  configurations  are  very  beautiful,  their  main 
application  to  information  retrieval  so  far  has  been  to  decrease  the  redundancy 
incurred  when  compound  inverted  lists  are  being  used;  and  David  K.  Chow 
[Information  and  Control  15  (1969),  377—396]  has  observed  that  this  type  of 
decrease  can  be  obtained  even  without  using  combinatorial  designs. 

A short  history  and  bibliography.  The  first  published  article  dealing  with  a 
technique  for  secondary  key  retrieval  was  by  L.  R.  Johnson  in  CACM  4 (1961), 
218-222.  The  multilist  system  was  developed  independently  by  Noah  S.  Prywes, 
H.  J.  Gray,  W.  I.  Landauer,  D.  Lefkowitz,  and  S.  Litwin  at  about  the  same 
time;  see  IEEE  Trans,  on  Communication  and  Electronics  82  (1963),  488-492. 
Another  rather  early  publication  that  influenced  later  work  was  by  D.  R.  Davis 
and  A.  D.  Lin,  CACM  8 (1965),  243-246. 

Since  then  a large  literature  on  the  subject  grew  up  rapidly,  but  much  of 
it  dealt  with  the  user  interface  and  with  programming  language  considerations, 
which  are  not  within  the  scope  of  this  book.  In  addition  to  the  papers  already 
cited,  the  following  published  articles  were  found  to  be  most  helpful  to  the 
author  as  this  section  was  first  being  written  in  1972:  Jack  Minker  and  Jerome 
Sable,  Ann.  Rev.  of  Information  Science  and  Technology  2 (1967),  123-160; 
Robert  E.  Bleier,  Proc.  ACM  Nat.  Conf.  22  (1967),  41-49;  Jerome  A.  Feldman 
and  Paul  D.  Rovner,  CACM  12  (1969),  439-449;  Burton  H.  Bloom,  Proc.  ACM 
Nat.  Conf.  24  (1969),  83-95;  H.  S.  Heaps  and  L.  H.  Thiel,  Information  Storage 
and  Retrieval  6 (1970),  137-153;  Vincent  Y.  Lum  and  Huei  Ling,  Proc.  ACM 
Nat.  Conf.  26  (1971),  349-356.  A good  survey  of  manual  card-filing  systems 
appears  in  Methods  of  Information  Handling  by  C.  P.  Bourne  (New  York:  Wiley, 
1963),  Chapter  5.  Balanced  filing  schemes  were  originally  developed  by  C.  T. 
Abraham,  S.  P.  Ghosh,  and  D.  K.  Ray-Chaudhuri  in  1965;  see  the  article  by 
R.  C.  Bose  and  Gary  G.  Koch,  SIAM  J.  Appl.  Math.  17  (1969),  1203-1214. 

AT  Most  of  the  classical  algorithms  for  multi-attribute  data  that  are  known  to 
X be  of  practical  importance  have  been  discussed  above;  but  a few  more  topics 
are  planned  for  the  next  edition  of  this  book,  including  the  following: 

• E.  M.  McCreight  introduced  priority  search  trees  [SICOMP  14  (1985),  257- 
276],  which  are  specially  designed  to  represent  intersections  of  dynamically 
changing  families  of  intervals,  and  to  handle  range  queries  of  the  form  “Find 
all  records  with  x0  < x < xx  and  y < yi”  (Notice  that  the  lower  bound 
on  y must  be  — oo,  but  x can  be  bounded  on  both  sides.) 

• M.  L.  Fredman  has  proved  several  fundamental  lower  bounds,  which  show 
that  a sequence  of  N intermixed  insertions,  deletions,  and  fc-dimensional 
range  queries  must  take  D(AT(log  N)k)  operations  in  the  worst  case,  re- 
gardless of  the  data  structure  being  used.  See  JACM  28  (1981),  696-705; 
SICOMP  10  (1981),  1-10;  J.  Algorithms  2 (1981),  77-87. 

Basic  algorithms  for  pattern  matching  and  approximate  pattern  matching  in  text 
strings  will  be  discussed  in  Chapter  9. 
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It  is  interesting  to  note  that  the  human  brain  is  much  better  at  secondary  key 
retrieval  than  computers  are;  in  fact,  people  find  it  rather  easy  to  recognize  faces 
or  melodies  from  only  fragmentary  information,  while  computers  have  barely 
been  able  to  do  this  at  all.  Therefore  it  is  not  unlikely  that  a completely  new 
approach  to  machine  design  will  someday  be  discovered  that  solves  the  problem 
of  secondary  key  retrieval  once  and  for  all,  making  this  entire  section  obsolete. 

EXERCISES 

► 1.  [ M27 ] Let  0 < k < n/2.  Prove  that  the  following  construction  produces  (™) 
permutations  of  {1, 2, . . . , n}  such  that  every  t-element  subset  of  {1,2,...,  n}  appears 
as  the  first  t elements  of  at  least  one  of  the  permutations,  for  t < k or  t > n — k: 
Consider  a path  in  the  plane  from  (0, 0)  to  ( n , r)  where  r > n — 2k,  in  which  the  ith 
step  is  from  — to  (i,j+ 1)  or  to  (i,  j — 1);  the  latter  possibility  is  allowed  only  if 
j > 1,  so  that  the  path  never  goes  below  the  x axis.  There  are  exactly  (£)  such  paths. 
For  each  path  of  this  kind,  a permutation  is  constructed  as  follows,  using  three  lists 
that  are  initially  empty:  For  i = 1,  2,  . . . , n,  if  the  ith  step  of  the  path  goes  up,  put 
the  number  i into  list  B;  if  the  step  goes  down,  put  i into  list  A and  move  the  currently 
largest  element  of  list  B into  list  C.  The  resulting  permutation  is  equal  to  the  final 
contents  of  list  A,  then  list  B,  then  list  C,  each  list  in  increasing  order. 

For  example,  when  n — 4 and  k — 2,  the  six  paths  and  permutations  defined  by 
this  procedure  are 

V aa  /X 

1 1 2 3 4|  2|3  4|1  2 4||1  3 3|1  4|2  3 4||1  2 4)1  2|3 

(Vertical  lines  show  the  division  between  lists  A,  B,  and  C.  These  six  permutations 
correspond  to  the  compound  attributes  in  (8).) 

Hint:  Represent  each  t-element  subset  S by  a path  that  goes  from  (0, 0)  to 
(n,  n— 2t),  whose  ith  step  runs  from  (i  — l,j)  to  (i,  j+1)  if  i £ S and  to  — if 
i £ S.  Convert  every  such  path  into  an  appropriate  path  having  the  special  form 
stated  above. 

2.  [M25]  (Sakti  P.  Ghosh.)  Find  the  minimum  possible  length  l of  a list  . . . r; 
of  references  to  records,  such  that  the  set  of  all  responses  to  any  of  the  inclusive  queries 
**1,  *1*,  1**,  *11,  1*1,  11*,  111  on  three  binary- valued  secondary  keys  will  appear  in 
consecutive  locations  rt . . .rj. 

3.  [19]  In  Table  2,  what  inclusive  queries  will  cause  (a)  Old-Fashioned  Sugar  Cookies, 
(b)  Oatmeal-Date  Bars,  to  be  obtained  among  the  false  drops? 

4.  [M30]  Find  exact  formulas  for  the  probabilities  in  (n),  assuming  that  each  record 
has  r distinct  attributes  chosen  randomly  from  among  the  (£)  fc-bit,  codes  in  an  n-bit 
field  and  that  the  query  involves  q distinct  but  otherwise  random  attributes.  (Don’t 
be  alarmed  if  the  formulas  do  not  simplify.) 

5.  [40]  Experiment  with  various  ways  to  avoid  the  redundancy  of  text  when  using 
Harrison’s  technique  for  substring  searching. 

► 6.  [M20]  The  total  number  of  m-bit  basic  queries  with  t bits  specified  is  s = (™)2*. 
If  a combinatorial  hashing  function  like  that  in  (13)  converts  these  queries  into  l\,  I2, 
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. . . , ls  locations,  respectively,  L(t)  = (li  + I2  + ■ ■ ■ + ls)/s  is  the  average  number  of 
locations  per  query.  [For  example,  in  (13)  we  have  L{ 3)  = 1.75.] 

Consider  now  a composite  hash  function  on  an  (mi  + rri2)-bit  field,  formed  by 
mapping  the  first  m\  bits  with  one  hash  function  and  the  remaining  m2  with  another, 
where  L\(t)  and  L,2(t)  are  the  corresponding  average  numbers  of  locations  per  query. 
Find  a formula  that  expresses  L(f),  for  the  composite  function,  in  terms  of  L\  and  L2. 

7.  [M24]  (R.  L.  Rivest.)  Find  the  functions  L(t),  as  defined  in  the  previous  exercise, 
for  the  following  combinatorial  hash  functions: 

(a)  m = 3,  n = 2 (b)  m = 4,  n = 2 


0 0 * -»  0 
1 * 0 -*  1 
* 1 1 -t  2 
1 0 1 A 3 
0 1 0 -A  3 


0 0 * * 0 
*1*0  — t 1 
* 1 1 1 2 
1 0 1 * ->  2 
* 1 0 1 ->  3 
1 0 0 * -*  3 


8.  [ M32 ] (R.  L.  Rivest.)  Consider  the  set  Qt,m  of  all  2*(™)  basic  m-bit  queries 
like  (10)  in  which  there  are  exactly  t specified  bits.  Given  a set  S of  m-bit  records, 
let  ft{S)  denote  the  number  of  queries  in  Qt,m  whose  answer  contains  a member  of  5; 
and  let  ft(s,m)  be  the  minimum  ft(S)  over  all  such  sets  S having  s elements,  for 

0 < s < 2m.  By  convention,  /<(0,0)  = 0 and  /t(l,0)  = <5to- 

a)  Prove  that,  for  all  t > 1 and  m > 1,  and  for  0 < .s  < 2m, 

ft(s,  m)  = /t([s/2],m  - 1)  + ft-1(\s/2],m  - 1)  + ft-i([s/2\,m  - 1). 

b)  Consider  any  combinatorial  hash  function  h from  the  2m  possible  records  to 
2n  lists,  with  each  list  corresponding  to  2m~n  records.  If  each  of  the  queries  in 
Qt,m  is  equally  likely,  the  average  number  of  lists  that  need  to  be  examined  per 
query  is  1/2*  (™)  times 

(lists  examined  for  Q)  = ^ (queries  of  Qt,m  relevant  to  S)  > 2n/t(2m_n, m). 

Q€Qt,m  lists  S 

Show  that  h is  optimal,  in  the  sense  that  this  lower  bound  is  achieved,  when  each 
of  the  lists  is  a “subcube”;  in  other  words,  show  that  equality  holds  in  the  case 
when  each  list  corresponds  to  a set  of  records  that  satisfies  some  basic  query  with 
exactly  n specified  bits. 

9.  [M20]  Prove  that  when  v = 3”,  the  set  of  all  triples  of  the  form 

{(di  . . . 1 o&l  . . . bn-k) 3,  (oi  . . . ak- 1 1 Cl  . . . Cn-k)3,  (dl  . . . djfc-l  2 dl  . . . dn-k)3}, 

1 < k < n,  forms  a Steiner  triple  system,  where  the  a' s,  b’s,  c’s,  and  d’s  range  over  all 

combinations  of  Os,  ls,  and  2s  such  that  bj  + Cj  +dj  =0  (modulo  3)  for  1 < j < n — k. 

10.  [M32]  (Thomas  P.  Kirkman,  Cambridge  and  Dublin  Math.  Journal  2 (1847), 
191-204.)  Let  us  say  that  a Kirkman  triple  system  of  order  v is  an  arrangement  of 
v + 1 objects  {xo,x\,  into  triples  such  that  every  pair  (.rt,  x3  } for  i ^ j occurs 

in  exactly  one  triple,  except  that  the  v pairs  {xt,  £(,+i)  mod  v } do  not  ever  occur  in  the 
same  triple,  for  0 < i < v.  For  example, 

(xo , X2 , X4 ) , {xi , X3 , X4 J 

is  a Kirkman  triple  system  of  order  4. 
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a)  Prove  that  a Kirkman  triple  system  can  exist  only  when  v mod  6 = 0 or  4. 

b)  Given  a Steiner  triple  system  S on  v objects  {sq,  prove  that  the  following 

construction  yields  another  Steiner  system  S'  on  2v  + 1 objects  and  a Kirkman 
triple  system  K'  of  order  2v  — 2:  The  triples  of  S'  are  those  of  S plus 

i)  {xi,yj,  yk}  where  j + k = i (modulo  v)  and  j < k,  1 < i,j,k  < v; 

ii)  {xi,yj,  z}  where  2 j = i (modulo  v),  1 < *,  j < v. 

The  triples  of  K'  are  those  of  S'  minus  all  those  containing  yi  and/or  yv. 

c)  Given  a Kirkman  triple  system  K on  {x0,  x\ , where  v = 2u,  prove  that 

the  following  construction  yields  a Steiner  triple  system  S'  on  2v  + 1 objects  and 
a Kirkman  triple  system  K1  of  order  2v  — 2:  The  triples  of  S'  are  those  of  K plus 

1)  {xi , mod  u > 2/i+l  0 ^ % <C  V\ 

ii)  {xi,  yj,yk},  j + k = 2i  + 1 (modulo  v— 1),  1 < j < k — 1 <v  — 2,  1 < i < v — 2; 

iii)  {xi,yj,yv},  2 j = 2i  + 1 (modulo  v— 1),  1 < j < v — 1,  1 < i < v — 2\ 

iv)  {x0,y2j,y2j+\},  {xv-i,y2j-i,y2j},  {xv,yj,yv-j},  for  1 < j < u\ 

v)  {xv,yu,yv}. 

The  triples  of  K'  are  those  of  S'  minus  all  those  containing  y%  and/or  yv-i- 

d)  Use  the  preceding  results  to  prove  that  Kirkman  triple  systems  of  order  v exist  for 
all  v > 0 of  the  form  6 k or  6 k + 4,  and  Steiner  triple  systems  on  v objects  exist 
for  all  v > 1 of  the  form  6A;  + 1 or  6k  + 3. 

11.  [M25]  The  text  describes  the  use  of  Steiner  triple  systems  in  connection  with 
inclusive  queries;  in  order  to  extend  this  to  all  basic  queries  it  is  natural  to  define 
the  following  concept.  A complemented  triple  system  of  order  v is  an  arrangement  of 
2v  objects  {xi, . . . , xv,  X\, . . . , xv}  into  triples  such  that  every  pair  of  objects  occurs 
together  in  exactly  one  triple,  except  that  complementary  pairs  {x»,Xj}  never  occur 
together.  For  example, 

{x\ , X2 , X3  j , (xi , X2 , X3  } , {xi,X2,X3j,  {xi,X2,X3j 

is  a complemented  triple  system  of  order  three. 

Prove  that  complemented  triple  systems  of  order  v exist  for  all  v > 0 not  of  the 
form  3/c  + 2. 

12.  [ M2S ] Continuing  exercise  11,  construct  a complemented  quadruple  system  of 
order  7. 

13.  [ M25 ] Construct  quadruple  systems  with  v = 4n  elements,  analogous  to  the  triple 
system  of  exercise  9. 

14.  [28 ] Discuss  the  problem  of  deleting  nodes  from  quadtrees,  k-d  trees,  and  post- 
office  trees  like  Fig.  45. 

15.  [HM30]  (P.  Elias.)  Given  a large  collection  of  m-bit  records,  suppose  we  want  to 
find  a record  closest  to  a given  search  argument,  in  the  sense  that  it  agrees  in  the  most 
bits.  Devise  an  algorithm  for  solving  this  problem  efficiently,  assuming  that  an  m-bit 
t-error-correcting  code  of  2"  elements  is  given,  and  that  each  record  has  been  hashed 
onto  one  of  2n  lists  corresponding  to  the  nearest  codeword. 

► 16.  [25]  (W.  H.  Kautz  and  R.  C.  Singleton.)  Show  that  a Steiner  triple  system  of 
order  v can  be  used  to  construct  v(v  — l)/6  codewords  of  v bits  each  such  that  no 
codeword  is  contained  in  the  superposition  of  any  two  others. 
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► 17.  [M30]  Consider  the  following  way  to  reduce  (2n  + l)-bit  keys  a_n  . . . ao  . . . an  to 
(n  + l)-bit  bucket  addresses  b0  . . . bn: 

bo  4—  ao; 

if  bk-i  = 0 then  bk  <—  a-k  else  bk  4—  a*,,  for  1 < k < n. 

a)  Describe  the  keys  that  appear  in  bucket  bo  . . . bn  ■ 

b)  What  is  the  largest  number  of  buckets  that  need  to  be  examined,  in  a basic  query 
that  has  t bits  specified? 

► 18.  [ M35 ] ( Associative  block  designs.)  A set  of  m-tuples  like  (13),  with  exactly  rri  — n 
*’s  in  each  of  2n  rows,  is  called  an  ABD(m,  n)  if  every  column  contains  the  same 
number  of  *’s  and  if  every  pair  of  rows  has  a “mismatch”  (0  versus  1)  in  some  column. 
Every  m-bit  binary  number  will  then  match  exactly  one  row.  For  example,  (13)  is  an 
ABD(4, 3). 

a)  Prove  that  an  ABD(m,n)  is  impossible  unless  m is  a divisor  of  2"~1n  and  n2  > 
2m(l  — 2~n). 

b)  A row  of  an  ABD  is  said  to  have  odd  parity  if  it  contains  an  odd  number  of  Is. 
Show  that,  for  every  choice  of  m — n columns  in  an  ABD(m,n),  the  number  of 
odd-parity  rows  with  *’s  in  these  columns  equals  the  number  of  even-parity  rows. 
In  particular,  each  pattern  of  asterisks  must  occur  in  an  even  number  of  rows. 

c)  Find  an  ABD(4, 3)  that  cannot  be  obtained  from  (13)  by  permuting  and/or  com- 
plementing columns. 

d)  Construct  an  ABD(16, 9). 

e)  Construct  an  ABD(16, 10).  Start  with  the  ABD(16,9)  of  part  (d),  instead  of  the 
ABD(8, 5)  of  (15). 

19.  [M22]  Analyze  the  ABD(8, 5)  of  (15),  as  (13)  has  been  analyzed  in  (14):  How 
many  of  the  32  locations  must  be  searched  for  an  average  query  with  k bits  unspecified? 
How  many  must  be  searched  in  the  worst  case? 

20.  [M/7]  Find  all  ABD(m, n)  when  n = 5 or  n = 6. 
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A new  Section  6.6  devoted  to  “persistent  data  structures”  is  planned  for  the  next 
JL  edition  of  the  present  book.  Persistent  structures  are  able  to  represent  changing 
information  in  such  a way  that  the  past  history  can  be  reconstructed  efficiently.  In  other 
words,  we  might  do  many  insertions  and  deletions,  but  we  can  still  conduct  searches 
as  if  the  updates  after  a given  time  had  not  been  made.  Relevant  early  references  to 
this  topic  include  the  following  papers: 

• J.  K.  Mullin,  Comp.  J.  24  (1981),  367-373; 

• M.  H.  Overmars,  Lecture  Notes  in  Comp.  Sci.  156  (1983),  Chapter  9; 

• E.  W.  Myers,  ACM  Symp.  Principles  of  Prog.  Lang.  11  (1984),  66-75; 

• B.  Chazelle,  Information  and  Control  63  (1985),  77-99; 

• D.  Dobkin  and  J.  I.  Munro,  J.  Algorithms  6 (1985),  455-465; 

• R.  Cole,  J.  Algorithms  7 (1986),  202-220; 

• D.  Field,  Information  Processing  Letters  24  (1987),  95-96; 

• C.  W.  Fraser  and  E.  W.  Myers,  ACM  Trans.  Prog.  Lang,  and  Systems  9 (1987), 
277-295; 

• J.  R.  Driscoll,  N.  Sarnak,  D.  D.  Sleator,  and  R.  E.  Tarjan,  J.  Comp.  Syst.  Sci.  38 
(1989),  86-124; 

• R.  B.  Dannenberg,  Software  Practice  & Experience  20  (1990),  109-132; 

• J.  R.  Driscoll,  D.  D.  K.  Sleator,  and  R.  E.  Tarjan,  JACM  41  (1994),  943-959. 


Instruction  tables  [programs]  will  have  to  be  made  up 
by  mathematicians  with  computing  experience 
and  perhaps  a certain  puzzle  solving  ability. 
There  will  probably  be  a great  deal  of  work  of  this  kind  to  be  done, 

for  every  known  process  has  got  to  be 
translated  into  instruction  table  form  at  some  stage.  ... 
This  process  of  constructing  instruction  tables  should  be  very  fascinating. 

There  need  be  no  real  danger  of  it  ever  becoming  a drudge, 
for  any  processes  that  are  quite  mechanical 
may  be  turned  over  to  the  machine  itself. 

— ALAN  M.  TURING  (1945) 
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“I  have  answered  three  questions,  and  that  is  enough," 
Said  his  father,  "don ’t  give  yourself  airs! 
"Do  you  think  I can  listen  all  day  to  such  stuff? 

Be  off,  or  I'll  kick  you  down  stairs!" 

— LEWIS  CARROLL,  Alice’s  Adventures  Under  Ground  (1864) 


NOTES  ON  THE  EXERCISES 

1.  An  average  problem  for  a mathematically  inclined  reader. 

3.  See  W.  J.  LeVeque,  Topics  in  Number  Theory  2 (Reading,  Mass.:  Addison- Wesley, 
1956),  Chapter  3;  P.  Ribenboim,  13  Lectures  on  Fermat’s  Last  Theorem  (New  York: 
Springer- Verlag,  1979);  A.  Wiles,  Annals  of  Mathematics  (2)  141  (1995),  443-551. 

SECTION  5 

1.  Let  p(l) . . . p(N ) and  q(l) . . . q(N ) be  different  permutations  satisfying  the  condi- 
tions, and  let  i be  minimal  with  p(i)  ^ q(i).  Then  p(i)  = q(j)  for  some  j > i,  and 
q(i)  = p{k)  for  some  k > i.  Since  Kp(i)  < Kp(k)  = Kq{i)  < KqU)  = Kp(i)  we  have 
Kp(i)  = Kq(i)\  hence  by  stability  p(i)  < p(k)  = q(i)  < q(j)  = p(i),  a contradiction. 

2.  Yes,  if  the  sorting  operations  were  all  stable.  (If  they  were  not  stable  we  cannot 
say.)  Alice  and  Chris  certainly  have  the  same  result;  and  so  does  Bill,  since  the 
stability  shows  that  equal  major  keys  in  his  result  are  accompanied  by  minor  keys 
in  nondecreasing  order. 

Formally,  assume  that  Bill  obtains  Rp(x)  . . . Rp(n)  — R'i  ■ ■ ■ R'n  after  sorting  the 
minor  keys,  then  R'q(i)  ■ ■ ■ R'q(N)  — Rp(q(i))  ■ ■ ■ ^p(qiN))  after  sorting  the  major  keys;  we 
want  to  show  that 

(ATp(9(j)),  fcp(9(j)))  < (Rp(g(i+1)),  ^p(g(i+l))) 

for  1 ^ f <C  N.  If  Kp(q(ip  ^ Kp(q(i+i)) > we  have  Kp^q^p  < ^p(q(*+ i ) ) ! and  if  Kp(q(i))  — 
Kp(q(i+i)),  then  Kq(i)  = K'q(i+1),  hence  q(i)  < q(i  + 1),  hence  k'q(i)  < k'q(i+1);  that  is, 
kp(?(i))  ^ kp(q(i+ 1))  . 

3.  We  can  always  bring  all  records  with  equal  keys  together,  preserving  their  relative 
order,  treating  these  groups  of  records  as  a unit  in  further  operations;  hence  we  may 
assume  that  all  keys  are  distinct.  Let  a < b < c < a;  then  we  can  arrange  things 
so  that  the  first  three  keys  are  abc,  bca,  or  cab.  Now  if  N — 1 distinct  keys  can  be 
sorted  in  three  ways,  so  can  IV;  for  if  R'i  < • • ■ < KN-i  > Kn  we  always  have  either 
Ki-\  < Kn  < Ki  for  some  i,  or  Kn  < K\. 
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4.  First  compare  words  without  case  distinction,  then  use  case  to  break  ties.  More 
precisely,  replace  each  word  a by  the  pair  ( a ' , a)  where  a'  is  obtained  from  a by 
mapping  A — > a,  . . . , Z — » z;  then  sort  the  pairs  lexicographically.  This  procedure 
gives,  for  example,  tex  < Tex  < TeX  < TEX  < text. 

Dictionaries  must  also  deal  with  accented  letters,  prefixes,  suffixes,  and  abbrevia- 
tions; for  example, 

a < A < A < a-  < a.  < -a  < A-  < A.  < aa  < a.a. 

< aa  < aa  < AA  < A.A.  < AAA  < • • • < zz  < Zz.  < ZZ  < zzz  < ZZZ. 

In  this  more  general  situation  we  obtain  a'  by  mapping  a — t a,  A — t a,  etc.,  and 
dropping  the  hyphens  and  periods. 

5.  Let  p(0)  = 0 and  p((la)2)  = lp(|a|)a;  here  (la)2  is  the  ordinary  binary  represen- 
tation of  a positive  integer,  and  |a|  is  the  length  of  the  string  a.  We  have  p(l)  = 10, 
p( 2)  = 1100,  p(3)  = 1101,  p(4)  = 1110000,  ...,  p(1009)  = 111101001111110001,  ..., 
p( 65  5 36)  = 15024,  . . . , p(265536)  = 16065560,  etc.  The  length  of  p(n)  is 

|p(n)|  = X(n)  + A(A(n))  + A(A(A(n)))  H 1-  lg*  n + 1, 


where  A(0)  = 0,  A(n)  = [lgnj  for  n > 1,  and  lg*  n is  the  least  integer  m > 0 such  that 
Atml(n)  = 0.  [This  construction  is  due  to  V.  I.  Levenshtein,  Problemy  Kibernetiki  20 
(1968),  173-179;  see  also  D.  E.  Knuth  in  The  Mathematical  Gardner,  edited  by  D.  A. 
Klarner  (Belmont,  California:  Wadsworth  International,  1981),  310-325.] 

6.  Overflow  is  possible,  and  it  can  lead  to  a false  equality  indication.  He  should  have 
written,  “LDA  A;  CMPA  B”  and  tested  the  comparison  indicator.  (The  inability  to  make 
full-word  comparisons  by  subtraction  is  a problem  on  essentially  all  computers;  it  is 
the  chief  reason  for  including  CMPA, . . . , CMPX  in  Mix’s  repertoire.) 


7.  COMPARE  STJ  9F 
1H  LDX  A,  1 

CMPX  B , 1 
JNE  9F 


DEC1  1 
J1P  IB 
9H  JMP  * | 


8.  Solution  1,  based  on  the  identity  min(a,  6)  = |(o  + b 


|a-6|): 


LDA 

A 

SRAX 

1 

SRAX 

5 

ADD 

AB1 

DIV 

=2= 

ENTX 

1 

STA 

A1 

a — 2ai  -f-  0,2 

SLAX 

5 

STX 

A2 

M < i 

MUL 

AB2 

LDA 

B 

STX 

AB3 

SRAX 

5 

LDA 

A2 

DIV 

=2= 

ADD 

B2 

STA 

B1 

b — 2&i  + b2 

SUB 

AB3 

STX 

B2 

N < i 

SRAX 

5 

LDA 

A1 

DIV 

=2= 

SUB 

B1 

no  overflow  possible 

ADD 

Al 

STA 

AB1 

ai  — bi 

ADD 

Bl 

LDA 

A2 

SUB 

AB1(1:5) 

SUB 

B2 

STA 

c 1 

STA 

AB2 

d2  — b2 

(a2  - b2)  sign  (a  - b) 


no  overflow  possible 
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Solution  2,  based  on  the  fact  that  indexing  can  cause  interchanges  in  a tricky  way: 

S0L2  LDA  A 
STA  C 
STA  TA 
LDA  B 
STA  TB 

Now  duplicate  the  following  code  k times,  where  2k  > 1010: 


LDA 

TA 

SRAX 

5 

DIV 

=2= 

STX 

TEMP 

LD1 

TEMP 

STA 

TA 

LDA 

TB 

SRAX 

5 

DIV 

=2= 

STX 

TEMP 

LD2 

TEMP 

STA 

TB 

INC1 

0,2 

INC1 

0,2 

INC1 

0,2 

LD3 

TMIN.l 

LDA 

0,3 

STA 

C 

(This  scans  the  binary  representations  of  a and  b from  right  to  left,  preserving  their 
signs.)  The  program  concludes  with  a table: 


HLT 

CON 

c 

-1 

-i 

CON 

B 

0 

-i 

CON 

B 

+1 

-i 

CON 

A 

-1 

0 

TMIN  CON 

C 

0 

0 

CON 

B 

1 

0 

CON 

A 

-1 

1 

CON 

A 

0 

1 

CON 

C 

1 

1 

e,  rr 

l)(- 

x)(— 1)^ (r.^.)xr+J,  by  the  method  of  inclusion  and  exclusion  (exercise 
1.3.3-26).  This  can  also  be  written  r(^)  /*  <r~1(l  — t)N~r  dt,  a beta  distribution. 


10.  Sort  the  tape  contents,  then  count.  (Some  sorting  methods  make  it  convenient  to 
drop  records  whose  keys  appear  more  than  once  as  the  sorting  progresses.) 


11.  Assign  each  person  an  identification  number,  which  must  appear  on  all  forms 
concerning  that  individual.  Sort  the  information  forms  and  the  tax  forms  separately, 
with  this  identification  number  as  the  key.  Denote  the  sorted  tax  forms  by  Ri, . . . , Rn, 
with  keys  K\  < • • • < Kn.  (There  should  be  no  two  tax  forms  with  equal  keys.)  Add 
a new  (N  + l)st  record  whose  key  is  oo,  and  set  it—  1.  Then,  for  each  record  in  the 
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information  file,  check  if  it  has  been  reported,  as  follows:  Let  K denote  the  key  on  the 
information  form  being  processed. 

a)  If  K > Ki,  increase  i by  1 and  repeat  this  step. 

b)  If  K < Ki,  or  if  K — Ki  and  the  information  is  not  reflected  on  tax  form  II, , 
signal  an  error. 

Try  to  do  all  this  processing  without  wasting  the  taxpayers’  money. 

12.  One  way  is  to  attach  the  key  (j,  i)  to  the  entry  aij  and  to  sort  using  lexicographic 
order,  then  omit  the  keys.  (A  similar  idea  can  be  used  to  obtain  any  desired  reordering 
of  information,  when  a simple  formula  for  the  reordering  can  be  given.) 

In  the  special  case  considered  in  this  problem,  the  method  of  “balanced  two-way 
merge  sorting”  treats  the  keys  in  such  a simple  manner  that  it  is  unnecessary  to  write 
any  keys  explicitly  on  the  tapes.  Given  an  n X n matrix,  we  may  proceed  as  follows: 
First  put  odd-numbered  rows  on  tape  1,  even-numbered  rows  on  tape  2,  etc.,  obtaining 

Tape  1:  an  ai2  . . . ai„  031  032  . . . a3„  asi  052  . . . asn  . . . 

Tape  2:  021  022  ■ • • a2n  <141  042  . . . a\n  a6i  a62  • • • a,sn  ■ . • 

Then  rewind  these  tapes,  and  process  them  synchronously,  to  obtain 

Tape  3:  an  021  012  022  • • • a in  a2n  asi  a6i  052  ae2  ■ ■ ■ a$n  n • • • 

Tape  4:  031  041  032  a$2  ■ ■ • a3n  a^n  071  asi  072  a 82  • ■ • a7n  agn  . . . 

Rewind  these  tapes,  and  process  them  synchronously,  to  obtain 

Tape  1:  an  an  031  041  ai 2 • • ■ 042  . . . a^n  a 9,1  . . . 

Tape  2:  051  a6i  071  asi  052  . . . a§2  . . . agn  ai3,i  . . . 

And  so  on,  until  the  desired  transpose  is  obtained  after  fig  n]  + 1 passes. 

13.  One  way  is  to  attach  random  distinct  key  values,  sort  on  those  keys,  then  discard 
the  keys.  (See  exercise  12;  a similar  method  for  obtaining  a random  sample  was 
discussed  in  Section  3.4.2.)  Another  technique,  involving  about  the  same  amount  of 
work  but  apparently  not  straining  the  accuracy  of  the  random  number  generator  as 
much,  is  to  attach  a random  integer  in  the  range  0 < Ki  < N — i to  Ri,  then  rearrange 
using  the  technique  of  exercise  5. 1.1-5. 

14.  With  a character-conversion  table,  you  can  design  a lexicographic  comparison  rou- 
tine that  simulates  the  order  used  on  the  other  machine.  Alternatively,  you  could  create 
artificial  keys,  different  from  the  actual  characters  but  giving  the  desired  ordering.  The 
latter  method  has  the  advantage  that  it  needs  to  be  done  only  once;  but  it  takes  more 
space  and  requires  conversion  of  the  entire  key.  The  former  method  can  often  determine 
the  result  of  a comparison  by  converting  only  one  or  two  letters  of  the  keys;  during 
later  stages  of  sorting,  the  comparison  will  be  between  nearly  equal  keys,  however, 
and  the  former  method  may  find  it  advantageous  to  check  for  equality  of  letters  before 
converting  them. 

15.  For  this  problem,  just  run  through  the  file  once  keeping  50  or  so  individual  counts. 
But  if  “city”  were  substituted  for  “state,”  and  if  the  total  number  of  cities  were  quite 
large,  it  would  be  a good  idea  to  sort  on  the  city  name. 

16.  As  in  exercise  15,  it  depends  on  the  size  of  the  problem.  If  the  total  number  of 
cross-reference  entries  fits  into  high-speed  memory,  the  best  approach  is  probably  to 
use  a symbol  table  algorithm  (Chapter  6)  with  each  identifier  associated  with  the  head 
of  a linked  list  of  references.  For  larger  problems,  create  a file  of  records,  one  record 
for  each  cross-reference  citation  to  be  put  in  the  index,  and  sort  it. 
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17.  Carry  along  with  each  card  a “shadow  key”  that,  sorted  lexicographically  in  the 
usual  simple  way,  will  define  the  desired  ordering.  This  key  is  to  be  supplied  by  library 
personnel  and  attached  to  the  catalog  data  when  it  first  enters  the  system,  although 
it  is  not  visible  to  normal  users.  A possible  key  uses  the  following  two- letter  codes  to 
separate  words  from  each  other: 

liO  end  of  key; 

U1  end  of  cross-reference  card; 

u2  end  of  surname; 

u3  hyphen  of  multiple  surname; 

u4  end  of  author  name; 

u5  end  of  place  name; 

u6  end  of  subject  heading; 

u7  end  of  book  title; 

u8  space  between  words. 


The  given  example  would  come  out  as  follows  (showing  only  the  first  25  characters): 


ACCADEMIAU8NAZI0NALEU8DEI 

ACHTZEHNHUNDERTZW0LFU8EIN 

BIBLI0THEQUEU8DU8HIST0IRE 

BIBLI0THEQUEU8DESU8CURI0S 

BROWNu2Ju8CROSBYu4uO 

BROWNu2JOHNu4uO 

BR0WNU2J0HNU4MATHEMATICIA 

BROWNu2JOHNu40Fu8BOSTONuO 

BR0WNU2J0HNU41715U0 

BR0WNU2 J0HNu41715u6uO 

BROWNu2JOHNu41761uO 

BR0WNU2J0HNU41810U0 

BR0WNU3WILLIAMSU2REGINALD 

BR0WNU8AMERICAU7U0 

BR0WNU8ANDU8DALLIS0NSU8NE 

BROWN J0HNU2ALANU4U0 

DENU2VLADIMIRU8EDUARD0VIC 

DENU7U0 

DENU8LIEBENU8LANGENU8TAGU 

DIXU2M0RGANU41827U0 

DIXU8HUITU8CENTU8D0UZEU80 

DIXU8NEUVIEMEU8SIECLEU8FR 

EIGHTEENU8F0RTYU8SEVENU8I 

EIGHTEENU8TWELVEU80VERTUR 

IU8AMU8AU8MATHEMATICIANU7 

IU8BU8MU8J0URNALU80FU8RES 


Iu8HAu8EHADu7uO 

IAU8AU8L0VEU8ST0RYU7U0 

INTERNATI0NALU8BUSINESSU8 

KHUWARIZMIU2MUHAMMADU8IBN 

LAB0RU7  AU8MAGAZ INEU8F0RU8 

LAB0RU8RESEARCHU8ASS0CIAT 

LABOURylyO 

MACCALLSU8C00KB00KU7U0 

MACCARTHYU2J0HNU41927U0 

MACHINEU8INDEPENDENTU8C0M 

MACMAH0NU2PERCYU8ALEXANDE 

MISTRESSU8DALL0WAYU7U0 

MISTRESSU80FU8MISTRESSESU 

R0YALU8S0CIETYU80FU8L0ND0 

SAINTU8PETERSBURGERU8ZEIT 

SAINTU8SAENSU2CAMILLEU418 

SAINTEU8MARIEU2GAST0NU8PU 

SEMINUMERICALU8ALG0RITHMS 

UNCLEU8T0MSU8CABINU7U0 

UNITEDU8STATESU8BUREAUU80 

VANDERM0NDEu2ALEXANDERu8T 

VANVALKENBURGy  2MACU8ELWYN 

V0NNEUMANNU2J0HNU41903U0 

WH0LEU8ARTU80FU8LEGERDEMA 

WH0SU8AFRAIDU80FU8VIRGINI 

WIJNGAARDENU2ADRIAANU8VAN 


This  auxiliary  key  should  be  followed  by  the  card  data,  so  that  unequal  cards  having 
the  same  auxiliary  key  (e.g.,  Sir  John  = John)  are  distinguished  properly.  Notice  that 
“Saint-Saens”  is  a hyphenated  name  but  not  a compound  name.  The  birth  year  of 
al-Khuwarizml  should  be  given  as,  say,  u40779  with  a leading  zero.  (This  scheme  will 
work  until  the  year  9999,  after  which  the  world  will  face  a huge  software  crisis.) 

Careful  study  of  this  example  reveals  how  to  deal  with  many  other  unusual  types 
of  order  that  are  needed  in  human-computer  interaction. 
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18.  For  example,  we  can  make  two  files  containing  values  of  (it6  + v6  + w6)  modm 
and  ( 2 6 — x6  — y 6)  mod  m for  u < v < w,  x < y < z,  where  m is  the  word  size  of  our 
computer.  Sort  these  and  look  for  duplicates,  then  subject  the  duplicates  to  further 
tests.  (Some  congruences  modulo  small  primes  might  also  be  used  to  place  further 
restrictions  on  it,  v,  w,  x,  y,  z.) 

19.  In  general,  to  find  all  pairs  of  numbers  {xi,Xj}  with  xt  + Xj  = c,  where  c is  given: 
Sort  the  file  so  that  xi  < X?  <•■■<  xn-  Set  i «—  1,  j <—  IV,  and  then  repeat  the 
following  operation  until  j < i: 

If  Xi  + Xj  — c,  output  {xi,  xj},  set  i «—  i + 1,  j «—  j — 1; 

If  Xi  + Xj  < c,  set  i 4-  i + 1; 

If  Xi  + Xj  > c,  set  j «—  j — 1. 

Finally  if  j = i and  2x,  = c,  output  {xi,Xi}.  This  process  is  like  the  method  of 
exercise  18:  We  are  essentially  making  two  sorted  files,  one  containing  x\, . . . ,xn  and 
the  other  containing  c—xn,  ■ ■ ■ , c—x i,  and  checking  for  duplicates.  But  the  second  file 
doesn’t  need  to  be  explicitly  formed  in  this  case.  Another  approach,  suggested  by  Jiang 
Ling,  is  to  sort  on  a key  such  as  (x  > c/2  =>  x,  x < c/2  =>  c — x). 

A similar  algorithm  can  be  used  to  find  max{xj  + Xj  \ x<  + Xj  < c};  or  to  find,  say, 

min{:Ej  + y3  \ xt  + yj  > t}  given  t and  two  sorted  files  x\  < ■ ■ ■ < xrn,  y\  < • • • < yn. 

20.  Some  of  the  alternatives  are:  (a)  For  each  of  the  499,500  pairs  i,  j,  with  1 < i < 

j < 1000,  set  2/1  <—  Xi  ® Xj,  i/2  t-  i/i  & (yi  - 1),  2/3  2/2  & (2/2  - 1);  then  print  ( Xi,Xj ) 

if  and  only  if  2/3  = 0.  Here  ® denotes  “exclusive  or”  and  & denotes  “bitwise  and”, 
(b)  Create  a file  with  31,000  entries,  forming  31  entries  from  each  original  word  Xi  by 
including  Xi  and  the  30  words  that  differ  from  xt  in  one  position.  Sort  this  file  and 
look  for  duplicates,  (c)  Do  a test  analogous  to  (a)  on 

i)  all  pairs  of  words  that  agree  in  their  first  10  bits; 

ii)  all  pairs  of  words  that  agree  in  their  middle  10  bits,  but  not  the  first  10; 

iii)  all  pairs  of  words  that  agree  in  their  last  10  bits,  but  neither  the  first  nor  middle  10. 
This  involves  three  sorts  of  the  data,  using  a specified  10-bit  key  each  time.  The 
expected  number  of  pairs  in  each  of  the  three  cases  is  at  most  499500/210,  which  is  less 
than  500,  if  the  original  words  are  randomly  distributed. 

21.  First  prepare  a file  containing  all  five-letter  English  words.  (Be  sure  to  consider 
adding  suffixes  such  as  -ED,  -ER,  -ERS,  -S  to  shorter  words.)  Now  take  each  five- 
letter  word  a and  sort  its  letters  into  ascending  order,  obtaining  the  sorted  five-letter 
sequence  a'.  Finally  sort  all  pairs  (a1,  a)  to  bring  all  anagrams  together. 

Experiments  by  Kim  D.  Gibson  in  1967  showed  that  the  second  longest  set  of 
commonly  known  five-letter  anagrams  is  LEAST,  SLATE,  STALE,  STEAL,  TAELS,  TALES, 
TEALS.  But  if  he  had  been  able  to  use  larger  dictionaries,  he  would  have  been  able  to 
catapult  this  set  into  first  place,  by  adding  the  words  ALETS  (steel  shoulderplates) , ASTEL 
(a  splinter),  ATLES  (intends),  LAETS  (people  who  rank  between  slaves  and  freemen), 
LASET  (an  ermine),  LATES  (a  Nile  perch),  LEATS  (watercourses),  SALET  (a  mediaeval 
helmet),  SETAL  (pertaining  to  setae),  SLEAT  (to  incite),  STELA  (a  column),  and  TESLA 
(a  unit  of  magnetic  flux  density).  Together  with  the  old  spellings  SATEL,  TASEL,  and 
TASLE  for  “settle”  and  “teasel,”  we  obtain  22  mutually  permutable  words,  none  of  which 
needs  to  be  spelled  with  an  uppercase  letter.  And  with  a bit  more  daring  we  might 
add  the  Old  English  tassl,  German  altes,  and  Madame  de  Stael!  The  set  {LAPSE,  LEAPS, 
PALES,  PEALS,  PLEAS,  SALEP,  SEPAL}  can  also  be  extended  to  at  least  14  words  when  we 
turn  to  unabridged  dictionaries.  [See  H.  E.  Dudeney,  Strand  65  (1923),  208,  312,  and 
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his  300  Best  Word  Puzzles,  edited  by  Martin  Gardner  (1968),  Puzzles  190  and  194; 
Ross  Eckler,  Making  the  Alphabet  Dance  (St.  Martin’s  Griffin,  1997),  Fig.  46c.] 

The  first  and  last  sets  of  three  or  more  five-letter  English  anagrams  are  {ALBAS, 
BALAS,  BALSA,  BASAL}  and  {STRUT,  STURT,  TRUST},  if  proper  names  are  not  allowed.  How- 
ever, the  proper  names  Alban,  Balan,  Laban,  and  Nabal  lead  to  an  earlier  set  {ALBAN, 
BALAN,  BANAL,  LABAN,  NABAL,NABLA}  if  that  restriction  is  dropped.  The  most  striking 
example  of  longer  anagram  words  in  common  English  is  perhaps  the  amazingly  math- 
ematical set  {ALERTING, ALTERING, INTEGRAL, RELATING, TRIANGLE}. 

A faster  way  to  proceed  is  to  compute /(a)  = (h(ai)  + h(a2)  H hh(a5))modm, 

where  oi, . . . ,a5  are  numerical  codes  for  the  individual  letters  in  a,  and  (h(l),h(2), ...) 
are  26  randomly  selected  constants;  here  m is,  say,  2L21gJVJ  when  there  are  N words. 
Sorting  the  file  (/(a),a)  with  two  passes  of  Algorithm  5.2.5R  will  bring  anagrams 
together;  afterwards  when  /(a)  = f(/3)  we  must  make  sure  that  we  have  a true  anagram 
with  a'  = fi'.  The  value  /(a)  can  be  calculated  more  rapidly  than  a',  and  this  method 
avoids  the  determination  of  a'  for  most  of  the  words  a in  the  file. 

Note:  A similar  technique  can  be  used  when  we  want  to  bring  together  all  sets  of 
records  that  have  equal  multiword  keys  (ai, . . . , a„).  Suppose  that  we  don’t  care  about 
the  order  of  the  file,  except  that  records  with  equal  keys  are  to  be  brought  together;  it 

is  sometimes  faster  to  sort  on  the  one-word  key  (aixn~1  + a2xn~2  -) (-  an)  mod  m, 

where  x is  any  fixed  value,  instead  of  sorting  on  the  original  multiword  key. 

22.  Find  isomorphic  invariants  of  the  graphs  (functions  that  take  equal  values  on 
isomorphic  directed  graphs)  and  sort  on  these  invariants,  to  separate  “obviously  noni- 
somorphic” graphs  from  each  other.  Examples  of  isomorphic  invariants:  (a)  Represent 
vertex  v,  by  ( a*, 6; ),  where  a;  is  its  in-degree  and  bi  is  its  out-degree;  then  sort  the 
pairs  (a,i,bi)  into  lexicographic  order.  The  resulting  file  is  an  isomorphic  invariant, 
(b)  Represent  an  arc  from  Vi  to  vj  by  (a;,  bt,  a3,  bj),  and  sort  these  quadruples  into 
lexicographic  order,  (c)  Separate  the  directed  graph  into  connected  components  (see 
Algorithm  2.3.3E),  determine  invariants  of  each  component,  and  sort  the  components 
into  order  of  their  invariants  in  some  way.  See  also  the  discussion  in  exercise  21. 

After  sorting  the  directed  graphs  on  their  invariants,  it  will  still  be  necessary  to 
make  secondary  tests  to  see  whether  directed  graphs  with  identical  invariants  are  in  fact 
isomorphic.  The  invariants  are  helpful  for  these  tests  too.  In  the  case  of  free  trees  it  is 
possible  to  find  “characteristic”  or  “canonical”  invariants  that  completely  characterize 
the  tree,  so  that  secondary  testing  is  unnecessary  [see  J.  Hopcroft  and  R.  E.  Tarjan,  in 
Complexity  of  Computer  Computations  (New  York:  Plenum,  1972),  140-142]. 

23.  One  way  is  to  form  a file  containing  all  three-person  cliques,  then  transform  it  into 
a file  containing  all  four-person  cliques,  etc.;  if  there  are  no  large  cliques,  this  method 
will  be  quite  satisfactory.  (On  the  other  hand,  if  there  is  a clique  of  size  n,  there  are  at 
least  (}‘)  cliques  of  size  k\  so  this  method  can  blow  up  even  when  n is  only  25  or  so.) 

Given  a file  that  lists  all  (k  — l)-person  cliques,  in  the  form  (oi, . . . ,<2fc-i)  where 
ai  < • • • < dfc_i,  we  can  find  the  fc-person  cliques  by  (i)  creating  a new  file  containing 
the  entries  (6,  c,  oi, . . . , at_2)  for  each  pair  of  (k  — l)-person  cliques  of  the  respective 
forms  (ai, . . . , aj,_2, 6),  (oi, . . . , a*_2,  c)  with  b < c;  (ii)  sorting  this  file  on  its  first 
two  components;  (iii)  for  each  entry  (6, c,  oi, . . . , ak-2)  in  this  new  file  that  matches 
a pair  (6,c)  of  acquaintances  in  the  originally  given  file,  output  the  k- person  clique 
(oi,  . . . , dfc-2,  6,  c). 

24.  (Solution  by  Norman  Hardy,  c.  1967.)  Make  another  copy  of  the  input  file;  sort 
one  copy  on  the  first  components  and  the  other  on  the  second.  Passing  over  these 
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files  in  sequence  now  allows  us  to  create  a new  file  containing  all  pairs  (z,,  xl+2)  for 
1 < i < N -2,  and  to  identify  (xN-i,xN).  The  pairs  (TV-1,  xN-i)  and  (TV, xn)  should 
be  written  on  still  another  file. 

The  process  continues  inductively.  Assume  that  file  F contains  all  pairs  (xi,  xl+t) 
for  1 < i < TV  — t,  in  random  order,  and  that  file  G contains  all  pairs  (i,  xt)  for 
TV  — t < i < N in  order  of  the  second  components.  Let  H be  a copy  of  file  F,  and 
sort  H by  first  components,  F by  second.  Now  go  through  F,  G,  and  H,  creating  two 
new  files  F'  and  G',  as  follows.  If  the  current  records  of  files  F,  G,  H are,  respectively 
(x,x'),  (y,y'),  ( z,z '),  then: 

i)  If  x’  = z,  output  (x,  z ) to  F'  and  advance  files  F and  H. 

ii)  If  x'  = y',  output  (y—t,  x)  to  G'  and  advance  files  F and  G. 

iii)  If  x'  > y',  advance  file  G. 

iv)  If  x > z,  advance  file  H. 

When  file  F is  exhausted,  sort  G'  by  second  components  and  merge  G with  it;  then 
replace  t by  2 1,  F by  F',  G by  G'. 

Thus  t takes  the  values  2, 4, 8, ... ; and  for  fixed  t we  do  0(log  TV)  passes  over  the 
data  to  sort  it.  Hence  the  total  number  of  passes  is  0((log  TV)2).  Eventually  t > TV,  so 
F is  empty;  then  we  simply  sort  G on  its  first  components. 

25.  (An  idea  due  to  D.  Shanks.)  Prepare  two  files,  one  containing  amn  mod  p and  the 
other  containing  ba  n mod  p for  0 < n < m.  Sort  these  files  and  find  a common  entry. 

Note:  This  reduces  the  worst-case  running  time  from  0(p)  to  ©(y'plogp).  Signifi- 
cant further  improvements  are  often  possible;  for  example,  we  can  easily  determine  if  n 
is  even  or  odd,  in  logp  steps,  by  testing  whether  fc(p-1)/2  modp  = 1 or  (p—  1).  In  general 
if  / is  any  divisor  of  p — 1 and  d is  any  divisor  of  gcd (/,  n),  we  can  similarly  determine 
(n/d)  mod  / by  looking  up  the  value  of  in  a table  of  length  f/d.  If  p — 1 has 

the  prime  factors  qi  < q-z  < • • • < qt  and  if  qt  is  small,  we  can  therefore  compute  n 
rapidly  by  finding  the  digits  from  right  to  left  in  its  mixed-radix  representation,  for 
radices  q\,  . . . , qt.  (This  idea  is  due  to  R.  L.  Silver,  1964;  see  also  S.  C.  Pohlig  and 
M.  Heilman,  IEEE  Transactions  IT-24  (1978),  106-110.) 

John  M.  Pollard  discovered  an  elegant  way  to  compute  discrete  logs  with  about 
0(y/p)  operations  mod  p,  requiring  very  little  memory,  based  on  the  theory  of  random 
mappings.  See  Math.  Comp.  32  (1978),  918-924,  where  he  also  suggests  another 
method  based  on  numbers  rij  = rJ  mod  p that  have  only  small  prime  factors. 

Asymptotically  faster  methods  are  discussed  in  exercise  4.5.4-46. 

SECTION  5.1.1 

1.  205223000;  27354186. 

2.  6i  = (m  — 1)  mod  n;  bj+i  = ( bj  + m — 1)  mod  ( n — j). 

3.  dj  = dn+i -j  (the  “reflected”  permutation).  This  idea  was  used  by  O.  Terquem 
[Journ.  de  Math.  3 (1838),  559-560]  to  prove  that  the  average  number  of  inversions  in 
a random  permutation  is  | (!]) . 

4.  Cl.  Set  xo  <—  0.  (It  is  possible  to  let  Xj  share  memory  with  bj  in  what  follows, 

for  1 < j < n.) 

C2.  For  k = n,  n — 1,  . . . , 1 (in  this  order)  do  the  following:  Set  j <—  0;  then  set 
j 4-  Xj  exactly  bk  times;  then  set  Xk  «—  Xj  and  Xj  <—  k. 

C3.  Set  j «-  0. 
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C4.  For  k = 1,  2,  . . . , n (in  this  order),  do  the  following:  Set  a*,  «—  Xj\  then  set 

j <-  Xj.  | 

To  save  memory  space,  see  exercise  5.2-12. 

5.  Let  a be  a string  [mi,ni] . . . [mk,nk]  of  ordered  pairs  of  nonnegative  integers;  we 
write  |q|  = k,  the  length  of  a.  Let  e denote  the  empty  (length  0)  string.  Consider  the 
binary  operation  o defined  recursively  on  pairs  of  such  strings  as  follows: 


£Oft  = aoe  = a; 


([m,n]a)  o ([m',n'}/3) 


[m,n\(a  o ([m'— m,  n1]  0)), 
[m',n']  (([m— m' — l,n]  a)  o (3), 


if  m < m! , 
if  m > m' . 


It  follows  that  the  computation  time  required  to  evaluate  a o /3  is  proportional  to 
|a  o /3\  = |a|  + |/3|.  Furthermore,  we  can  prove  that  o is  associative  and  that  [&i,  1]  o 
[62,  2]  o • • • o [&„,  n]  = [0,  ai][0, 02] . . . [0,  o„].  The  expression  on  the  left  can  be  evaluated 
in  [ lg  n]  passes,  each  pass  combining  pairs  of  strings,  for  a total  of  0(n  log  n)  steps. 

Example:  Starting  from  (2),  we  want  to  evaluate  [2, 1]  o [3, 2]  o [6, 3]  o [4, 4]  o [0, 5]  o 
[2, 6]  o [2, 7]  o[l,  8]o[0, 9].  The  first  pass  reduces  this  to  [2, 1][1, 2]  o[4,4][l,3]  o[0, 5][2,6]o 
[1, 8] [0, 7]o[0,9].  The  second  pass  reduces  it  to  [2, 1][1, 2] [1,  4] [1, 3]o[0, 5] [1, 8] [0, 6] [0,  7]o 
[0,9].  The  third  pass  yields  [0,5] [1,1] [0,8] [0,2] [0,6] [0,4] [0,7] [0,3]  o [0,9].  The  fourth 
pass  yields  (1). 

Motivation:  A string  such  as  [4, 4][1, 3]  stands  for  “uuuu4u3u°°” , where  “u”  denotes 
a blank;  the  operation  ao/3  inserts  the  blanks  and  nonblanks  of  [3  into  the  blanks  of  a. 
Note  that,  together  with  exercise  2,  we  obtain  an  algorithm  for  the  Josephus  problem 
that  is  0(n  log  n)  instead  of  O(mn),  partially  answering  a question  raised  in  exercise 
1.3.2-22. 

Another  0(n  log  n)  solution  to  this  problem,  using  a random-access  memory,  fol- 
lows from  the  use  of  balanced  trees  in  a straightforward  manner. 

6.  Start  with  61  = 62  = • • • = b„  = 0.  For  k = [lgnj,  [lgnj— 1,  ...,  0 do  the 
following:  Set  xs  «—  0 for  0 < s < n/2k+1;  then  for  j = 1,  2,  . . . , n do  the  following: 
Set  r •*—  Yaj/2k\  mod  2,  s «—  [a_,/2A:+1J  (these  are  essentially  bit  extractions);  if  r = 0, 
set  baj  baj  + xs,  and  if  r = 1 set  xs  «—  xs  + 1. 

Another  solution  appears  in  exercise  5.2.4-21. 

7.  Bj  < j and  Cj  < n — j,  since  a,}  has  j — 1 elements  to  its  left  and  n — j elements  to 
its  right.  To  reconstruct  fli  02  . . . a„  from  B\  B2  . . . Bn,  start  with  the  element  1;  then 
for  k = 2,  . . . , n add  one  to  each  element  > k — Bk  and  append  k — Bk  at  the  right. 
(See  Method  2 in  Section  1.2.5).  A similar  procedure  works  for  the  C’s.  Alternatively, 
we  could  use  the  result  of  the  following  exercise.  [The  c inversion  table  was  discussed 
by  Rodrigues,  J.  de  Math.  4 (1839),  236-240.  The  C inversion  table  was  used  by  Rothe 
in  1800;  see  also  Netto’s  Lehrbuch  der  Combinatorik  (1901),  §5.] 

8.  b'  = C,  c'  = B,  B'  = c,C'  = b,  since  each  inversion  (a,,  a:)  of  ai  . . . an  corresponds 
to  the  inversion  (j,  i)  of  a.\  . . . a'„ . Some  further  relations:  (a)  Cj  = j — 1 if  and  only  if 
(bi  > bj  for  all  i < j);  (b)  bj  = n—j  if  and  only  if  (c,  > Cj  for  all  i > j);  (c)  bj  = 0 if  and 
only  if  ( Ci—i  < Cj  — j for  all  i > j);  (d)  Cj  = 0 if  and  only  if  {bi+i  < bj  +j  for  all  1 < j); 
(e)  bi  < bi+i  if  and  only  if  a'  < a'+1,  if  and  only  if  a > Cj+i;  (f)  aj  = j + Cj  — Bj; 

Q,j  — j ~ (-  bj  Cj  . 

9.  b = C = b'  is  equivalent  to  a = a'. 
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10.  VW.  (One  way  to  coordinatize  the  truncated  octahedron  lets  the  respective 
vectors  (1,0,0),  (0,1,0),  ±(1,1, \/2),  ±(1,-1,  V2),  ±(-1,1,  v^),  ±(-l, -1,  V^)  stand 
for  adjacent  interchanges  of  the  respective  pairs  21,  43,  41,  31,  42,  32.  The  sum  of 
these  vectors  gives  (l,l,2\/2)  as  the  difference  between  vertices  4321  and  1234.) 

A more  symmetric  solution  is  to  represent  vertex  n in  four  dimensions  by 

£{  eu  — ev  | (u,  v)  is  an  inversion  of  7r}, 

where  ei  = (1, 0, 0,0),  e2  = (0, 1,0, 0),  e3  = (0, 0, 1, 0),  e4  = (0,0, 0, 1).  Thus,  1 234  -f> 
(0,0, 0,0);  12  43  ■«->  (0,0, — 1,1);  ...;  432  1 G+  (— 3,  — 1, 1, 3).  All  points  lie  on  the 
three-dimensional  subspace  {(w,x,y,  z)  \ w + x + y + z = 0};  the  distance  between  adja- 
cent vertices  is  \/2.  Equivalently  (see  answer  8(f))  we  may  represent  7r  = oi  a2  a3  a4  by 
the  vector  (a'!,a2,  a3,  a4),  where  a[  o2  a3  o4  is  the  inverse  permutation.  (This  4-D  repre- 
sentation of  the  truncated  octahedron  with  permutations  as  coordinates  was  discussed 
together  with  its  n-dimensional  generalization  by  C.  Howard  Hinton  in  The  Fourth 
Dimension  (London,  1904),  Chapter  10.  Further  properties  were  found  many  years  later 
by  Guilbaud  and  Rosenstiehl,  who  called  Fig.  1 the  “permutahedron” ; see  exercise  12.) 

Replicas  of  the  truncated  octahedron  will  fill  three-dimensional  space  in  what  has 
been  called  the  simplest  possible  way  [see  H.  Steinhaus,  Mathematical  Snapshots  (Ox- 
ford, 1960),  200-203;  C.  S.  Smith,  Scientific  American  190, 1 (January  1954),  58-64], 
Book  V of  Pappus’s  Collection  (c.  A.D.  300)  mentions  the  truncated  octahedron  as 
one  of  13  special  solid  figures  studied  by  Archimedes.  Illustrations  of  the  Archimedean 
solids  — the  nonprism  polyhedra  that  have  symmetries  taking  any  vertex  into  any  other, 
and  whose  faces  are  regular  polygons  but  not  all  identical  — can  be  found,  for  example, 
in  books  by  W.  W.  Rouse  Ball,  Mathematical  Recreations  and  Essays,  revised  by 
H.  S.  M.  Coxeter  (Macmillan,  1939),  Chapter  5;  H.  Martyn  Cundy  and  A.  P.  Rollett, 
Mathematical  Models  (Oxford,  1952),  94-109. 

11.  (a)  Obvious,  (b)  Construct  a directed  graph  with  vertices  {1,2,  ...,n}  and  arcs 
x — > y if  either  x > y and  (x,  y)  G E or  x < y and  (y,  x)  G E.  If  there  are  no  oriented 
cycles,  this  directed  graph  can  be  topologically  sorted,  and  the  resulting  linear  order  is 
the  desired  permutation.  If  there  is  an  oriented  cycle,  the  shortest  has  length  3,  since 
there  are  none  of  length  1 or  2 and  since  a longer  cycle  ai  — t a2  — > a3  — » a4  —>•••—►  a4 
can  be  shortened  (either  ai  — > a3  or  a3  — » ai).  But  an  oriented  cycle  of  length  3 
contains  two  arcs  of  either  E or  E,  and  proves  that  £ or  £ is  not  transitive  after  all. 

12.  [G.  T.  Guilbaud  and  P.  Rosenstiehl,  Math,  et  Sciences  Humaines  4 (1963),  9-33.] 
Suppose  that  (a,  6)  6 E,  (b,c)  G E,  (a,c)  £ E.  Then  for  some  k > 1 we  have 
a = xo  > x\  > ■ ■ ■ > Xk  = c,  where  (xi,Xi+ 1)  € E(tti)  U E(7t2)  for  0 < i < k. 
Consider  a counterexample  of  this  type  where  k is  minimal.  Since  ( a,b ) E(tti)  and 
( b,c ) E(ni),  we  have  ( a,c ) <f  E( 7Ti),  and  similarly  (a, c)  0 E( 7T2);  hence  k > 1.  But 
if  Xi  > b,  then  (xi , b)  G E contradicts  the  minimality  of  k,  while  (xi,b)  G E implies 
that  (a,  b)  G E.  Similarly,  if  x\  < 6,  both  (b,x i)  G E and  (b,xi)  G E are  impossible. 

13.  For  any  fixed  choice  of  for,  . . . , bn-m,  bn-m+ 2,  . . . , bn  in  the  inversion  table,  the 
total  bj  will  assume  each  possible  residue  modulo  m exactly  once  as  bn-m+i  runs 
through  its  possible  values  0,  1,  . . . , m — 1. 

14.  The  hinted  construction  takes  pairs  of  distinct-part  partitions  into  each  other, 
except  in  the  two  cases  j = k = pk  and  j = k = pk  — 1.  In  the  exceptional  cases,  n is 

{2j  - 1)  4 1 - j = (3 j2  - j)/ 2 and  (2 j)  H h (j  + 1)  = (3j2  + j)/ 2,  respectively, 

and  there  is  a unique  unpaired  partition  with  j parts.  [Comptes  Rendus  Acad.  Sci. 
92  (Paris,  1881),  448-450.  Euler’s  original  proof,  in  Novi  Comment.  Acad.  Sci.  Pet.  5 
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(1754),  75-83,  was  also  very  interesting.  He  showed  by  simple  manipulations  that  the 
infinite  product  equals  si,  if  we  define  sn  as  the  power  series  1 — z2n~1  — z3n~1sn+ 1, 
for  n > 1.  Finite  versions  of  Euler’s  infinite  sum  are  discussed  by  Knuth  and  Paterson 
in  Fibonacci  Quarterly  16  (1978),  198-212.] 

15.  Transpose  the  dot  diagram,  to  go  from  the  p’s  to  the  P’s.  The  generating  function 
for  the  P’s  is  easily  obtained,  since  we  first  choose  any  number  of  Is  (generating  function 
1/(1  — z)),  then  independently  choose  any  number  of  2s  (generating  function  1/(1  — z2)), 
. . . , finally  any  number  of  n’s. 

16.  The  coefficient  of  znqm  in  the  first  identity  is  the  number  of  partitions  of  m into 
at  most  n parts.  In  the  second  identity  it  is  the  number  of  partitions  of  m into  n 
distinct  nonnegative  parts,  namely  sums  of  the  form  m = pi  + P2  + ■ ■ ■ + pn,  where 
Pi  > P2  > • • • > Pn  > 0.  This  is  the  same  as  m - (”)  = qi  + <?2  + • • • + qn,  where 
qi  > Q2  > • • • > qn  > 0,  under  the  correspondence  qt  = p<  — n + i.  [Commentarii 
Academiae  Scientiarum  Petropolitanae  13  (1741),  64-93.] 

Notes:  The  second  identity  is  the  limit  as  n ->  oo  of  the  q-nomial  theorem,  exercise 
1.2.6-58.  The  first  identity,  similarly,  is  the  limit  as  r — » oo  of  the  dual  form  of  that 
theorem,  proved  in  the  answer  to  that  exercise. 

Let  n!q  = n*=iU  + <?  + •••  + <?fc_1),  and  let  exp9(^)  = zn/n-v  The  first 

identity  tells  us  that  exp9(z)  is  equal  to  l/n^Lol^  ~ qfcz(l  — q))  when  \q\  < 1;  the 
second  tells  us  that  it  equals  ]/[fcLo(f  + 9_fc^(l  — q-1))  when  \q\  > 1.  The  resulting 
formal  power  series  identity  exp9(z)  exp?_i(  — z)  = 1 is  equivalent  to  the  formula 


n 


E 


(i  - q)  ■ • • (i  - qk)  (i  - q)  • • • (1  - qn~k) 


— 


integer  n > 0, 


which  is  a consequence  of  the  q-nomial  theorem  with  x = — 1. 
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18.  Let  q = 1 — p.  The  sum  Pr(a)  over  all  instances  a of  inversions  may  be 
evaluated  by  summing  on  k,  where  0 < k < n is  the  exact  number  of  leftmost 
bit  positions  in  which  there  is  equality  between  i and  j as  well  as  between  and 
Xj,  in  an  inversion  Xi  ® i > Xj  © j for  i < j.  In  this  way  we  obtain  the  formula 
T,o<k<n  2V  + q2)fc(p22n_fc_l2n_fc_1  + 2pq2n~k~1  (2n~k~1  - 1));  summing  and  sim- 
plifying yields  2n-1(p(2  - p)(2n  - (p2  + q2)n)/( 2 - p2  - q2)  + (p2  + q2)n  - l). 

19.  The  number  of  inversions  is  5Z0<t<j<n(Lm4/nJ  — \mi/n\  — [m(j  — i)/n\)  = 
Eo<i<j<Jmi  mod  n < mi  mod  n]  = T,0<r<ni.mr/n\(r  ~{n-r)-(n-r-  1)),  which 
can  be  transformed  to  |(n  — l)(n  — 2)  — ^na(m,n,  0).  [CreJIe  198  (1957),  162-166.] 

20.  See  J.  J.  Sylvester,  Amer.  J.  Math.  5 (1882),  251-330,  6 (1883),  334-336,  §57-§68; 
E.  M.  Wright,  J.  London  Math.  Soc.  40  (1965),  55-57;  and  J.  Zolnowsky,  Discrete 
Math.  9 (1974),  293-298. 

Jacobi’s  identity  can  be  proved  rapidly  as  follows.  Since 


fid-  uV-1)  = (-iru(^%®  f[(i  _ n-v-*) , 

k= 1 k= 1 
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the  q-nomial  theorem  of  exercise  1.2.6-58  with  q = uv  tells  us  that 


n(l-wV-1)(l-ufc-1wfc)  = (-l)"«(n21)U(S)  J}  (l-i^-V) 


= (-i)%(^%©E(2n) 

' J ' uv 


-n  l — n\j 


Multiply  both  sides  by  n*=i(l  — ukvk)  = nj^=i ~ 9*)  anfl  note  that,  for  fixed  j,  we 

2n  ^ ri"  Cl  „k\  _ i I CVJ+I  — |j|\ 


have  (na;,.)flnfcn=i(l-9fc)  = 1 + 0(9" 


').  Jacobi’s  identity  follows  asn->  oo. 


21.  Interpret  Cj  as  the  number  of  elements  on  the  stack  after  the  jth  output.  (See 
exercise  2.3.3-19  for  characterizations  of  the  b and  B tables  of  stack  permutations.) 

22.  (a)  Arrange  the  numbers  {1, 2, . . . , n}  in  a circle  as  on  the  face  of  a clock,  and  point 
at  1.  Then  for  j = n,  n — 1,  . . . , 1 (in  this  order),  move  the  pointer  counterclockwise 
hj  + 1 steps,  remove  the  number  pointed  to  from  the  circle,  and  call  it  aj. 

(b)  Each  i is  counted  as  often  as  the  sequence  a,  at+i . . . an  wraps  around;  this 
is  the  number  of  times  that  aj  > aj+i  for  j > i.  Therefore  each  j with  aj  > a]+1 
corresponds  to  the  indices  1,  . . . , j being  counted  once.  [Guo-Niu  Han,  Advances  in 
Math.  105  (1994),  28-29;  an  equivalent  result  had  been  obtained  by  Rawlings,  in  the 
context  of  the  next  exercise.] 


23.  Suppose,  for  example,  that  n = 5 and  ai  02  03  <14  a5  = 314  2 5.  The  number  of 
missed  shots  before  each  death  must  then  be  2 + 5fci,  2 + 4/j2,  1 + 3^3,  1 + 2/02,  fcs, 
for  some  nonnegative  integers  kj . Note  that  the  dual  permutation  14  2 5 3 has  h-table 
0 1 1 2 2 in  the  notation  of  the  previous  exercise.  In  general,  the  probability  of  obtaining 
a,\  a?  ...  On  will  be 


E (^+nfclPl)(92n-1  + (n~1)fc2P2)  • • • {<fc+knPn) 

fcl  ,...,fcn> 0 


1 - gl  1-92 

1 nn  1 — 1 

1 - 9i  1 -q2 


1 9n  _hn  An-1 
1 _ nl  9l 


h\ 

•9n\ 


where  Pj  — 1 — Qj  is  the  probability  of  fatality  after  j — 1 deaths,  and  hi  /12  . . . hn 
corresponds  to  the  dual  of  0102  ...  on.  In  particular,  when  pi  = ■ ■ • — pn  = p — 
1 — q,  the  probability  is  qhl+"'+hn/Gn(q).  The  least  likely  order  is  therefore  n ...  2 1. 
[J.  Treadway  and  D.  Rawlings,  Math.  Mag.  67  (1994),  345-354;  Rawlings  generalized 
the  process  to  multiset  permutations  in  Int.  J.  Math.  & Math.  Sci.  15  (1992),  291-312.] 

24.  Let  oo  = 0,  and  say  that  a generalized  descent  occurs  at  j < n if  a3  > t(dj+i). 
Inserting  n between  a3-i  and  a}  causes  a new  generalized  descent  if  and  only  if  aj- 1 < 
t(aj)  < n.  Suppose  this  occurs  when  j has  the  values  Ji  > J2  > • • • > jk  > 0;  let  the 
other  values  of  j be  jn  > jn-i  > ■ ■ ■ > jk+i-  Then  jn  = n,  and  it  can  be  shown  that 
the  generalized  index  increases  by  n — k when  n is  inserted  just  before  a3k . [The  special 
case  in  which  t(j)  = j + d for  some  d > 0 is  due  to  D.  Rawlings,  J.  Combinatorial 
Theory  A31  (1981),  175-183;  he  generalized  this  special  case  to  multiset  permutations 
in  Linear  and  Multilinear  Algebra  10  (1981),  253-260.] 

This  exercise  defines  n\  different  statistics  on  permutations,  each  of  which  has 
the  generating  function  G„(z)  that  appears  in  (7)  and  (8).  We  can  define  many 
more  such  statistics  by  generalizing  Russian  roulette  as  follows:  After  j — 1 deaths, 
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the  person  who  begins  the  next  round  of  shooting  is  . . . , Oj_i),  where  fj  is  an 

arbitrary  function  taking  values  in  {1, ... , n}\{ai , . . . , Oj_i }.  [See  Guo-Niu  Han,  Calcul 
Denertien  (Thesis,  Univ.  Strasbourg,  1992),  Part  1.3,  §7.] 

25.  (a)  If  ai  < on,  h(a)  has  exactly  as  many  inversions  as  a,  because  the  elements  of 
otj  now  invert  Xj  instead  of  a„ ■ But  if  ai  > an,  h(ot)  has  n—  1 fewer  inversions,  because 
Xj  loses  its  inversion  of  a„  and  of  each  element  in  otj.  Therefore  if  we  set  xn  = an  and 
recursively  let  x\ . . . xn-i  = f(h(a)),  the  permutation  f(a)  = x,i . . . xn  has  the  desired 
properties.  We  have  /(198263745)  = 912638745,  and  /1_11(198263745)  = 
192687345. 

(b)  The  key  point  is  that  inv(a)  = inv(aT)  and  ind(a“)  = ind(/(a)_),  when  a- 

is  the  inverse  of  a.  Therefore  if  a4  = a-,  a2  = /(c*i),  a3  — a3  , a4  = (a3),  and 

a5  — we  have 

inv(a5)  = inv(a4)  = ind(a3)  = ind(a2  ) = ind(af ) = ind(a); 

ind(a5)  = ind(a4  ) = ind(aj ) = ind(a2)  = inv(ai)  = inv(a). 


[Math.  Nachrichten  83  (1978),  143-159.] 

26.  (Solution  by  Doron  Zeilberger.)  The  average  of  inv(a)  ind(a)  is 

jsE  E E [flj  > flfc  ] ^ [u(  ^ ] , 

a l<j<fc<n  !<(<n 


which  is  a polynomial  in  n of  degree  < 4.  Evaluating  this  sum  for  1 < n < 5 gives  the 
respective  values  0,  |,  §,  if!  so  the  polynomial  must  be  | n(n  — 1)  + ^n2(n  — l)2. 
Subtracting  mean(</„)2  and  dividing  by  var(gn)  gives  the  answer  9/(2 n + 5)  for  n > 2, 
by  (12)  and  (13). 

27.  We  have  inv(ai  a2  . . . an)  = inv(g„  . . . <72  9i),  when  qn  • ■ • <7i  is  regarded  as  a 

permutation  of  a multiset  (see  Section  5.1.2).  It  follows  that 


H„(w,  z) 


(1  -*)...(!-*») 


M = E w' 


inv(ai...a„)  ind(ai  ...an) 


ai  ...an 

= E 

9n>0 


z y,  * 

Pl  >‘">Pn  >0 
winv(qn...q2  9l)z91+92H h9r> 


PH hPn 


V ( n ) 

. V ko  1 , /c2  ? • • • /u 


fcl+2fc2  + -' 


ko  + ki+k2~\ — =n 


(. zjup 

k \ 

rt  1 • w 


= n\w  [«"]  Y II 

fco,fcl,fc2>---  J=0 

OO 

= n'-w  [un]  expw(z3u) 
j=o 

OOOO 

= ^Wn]UUiL-zi^(r-^y 

j=0  k= 0 ' ’ 

using  the  notation  of  answer  16  and  the  result  of  exercise  5.1.2-16.  Thus  we  have  the 
elegant  identity 

Hn(w,z)un 


TT  - = V 

44  l — wizku  ■“  (1  — 

j,k>0  n>0  ' 


(1  — w)(l  — to2) ...  (1  — wn)(l  — z)(l  — z2) ...  (1  — zn)  ’ 
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which  was  established  for  the  generating  function  H„(w,z)  = w'nd(a  ) zlnd(Q)  by 
D.  P.  Roselle  in  Proc.  Amer.  Math.  Soc.  45  (1974),  144-150.  Exercise  25  shows  that 
the  same  bivariate  generating  function  counts  indexes  and  inversions.  The  proof  given 
here  is  due  to  Garsia  and  Gessel  [Advances  in  Math.  31  (1979),  288-305],  who  went  on 
to  obtain  considerably  more  general  results. 

Setting  m = oo  in  exercise  4.7-27  leads  to  the  recurrence 

Hn(w,z)  = (II/1  ~ zU~3))Hn-k(w,z). 

28.  Interchanging  two  adjacent  elements  changes  the  total  displacement  by  0 or  ±2; 
hence  td(ai  02  . . . an)  < 2 inv(ai  02  . . . an). 

We  can  also  prove  that  td(ai  0,2  ■■  ■ a„)  > inv(ai  02  . . . a„).  Suppose  j is  the 
smallest  element  out  of  place,  and  let  a*,  = j.  Let  l be  maximum  with  l < k and 
at  > k.  Interchanging  a;  with  a*,  reduces  the  inversions  by  2 (k  — l)  — 1,  and  reduces  the 
total  displacement  by  2 (k  — l).  Therefore  if  m repetitions  of  this  algorithm  are  needed  to 
sort  a given  permutation  01  02  . . . a„,  we  have  td(aj  <12  . . . an)  = inv(ai  02  . . . a„)  + m. 

The  average  total  displacement  of  a random  permutation  is  (n2  — 1)/3;  see  exercise 
5. 2. 1-7.  The  generating  function  for  total  displacement  does  not  appear  to  have 
a simple  form.  References:  C.  Spearman,  British  J.  Psychology  2 (1906),  89-108; 
P.  Diaconis  and  R.  L.  Graham,  J.  Royal  Stat.  Soc.  B39  (1977),  262-268. 

29.  We  can  obtain  n as  a product  of  inv(7r)  transpositions  r, , where  tj  interchanges  j 
and  j + 1.  For  example,  the  path  1234  — > 1324  — > 1342  — > 3142  in  Fig.  1 corresponds 
to  T2,  then  T3,  then  Ti;  hence  3142  = T17-3T2.  Therefore  7T7r'  is  obtainable  from  n1  by 
making  inv(7r)  transpositions,  each  of  which  changes  the  number  of  inversions  by  ±1. 
It  follows  that  inv(7T7r')  < inv(7r)  + inv(7r').  If  equality  holds,  each  transposition  adds 
a new  inversion,  hence  E(irn')  D E(n'). 

Conversely,  if  E(nn')  D E(n'),  we  want  to  show  that  some  sequence  of  |E(7T7r')|  — 
|E(7r')|  = inv(7r7r')— inv(7r')  transpositions  will  transform  ir'  to  7T7r'.  Such  transpositions 
define  n,  so  this  will  prove  that  inv(7r)  < inv(7T7r')  — inv(7r');  hence  equality  must 
hold.  Suppose,  for  example,  that  7r'  = 314592687  and  that  E(ttk')  D E(n').  If 
E(mr' ) does  not  contain  (4,1)  or  (5,4)  or  (9,5)  or  (6,2)  or  (8,6),  then  mr'  must  be 
equal  to  7r'.  Otherwise  E( nn')  contains  one  of  them,  say  (9,5);  then  E(nn')  contains 
E(t47t')  = E(S  14  9 5 2 6 8 7).  In  this  way  we  can  prove  the  result  by  induction  on 
\E(nn')\-\E(n')\. 

SECTION  5.1.2 

1.  False,  because  of  a reasonably  important  technicality.  If  you  said  “true,”  you 
probably  didn’t  know  the  definition  of  Mi  U M2  given  in  Section  4.6.3,  which  has  the 
property  that  Mi  U M2  is  a set  whenever  Mi  and  M 2 are  sets.  Actually,  a j /3  is  a 
permutation  of  Mi  W M2 . 

2.  bcaddadadb. 

3.  Certainly  not,  since  we  may  have  a = /3.  (The  unique  factorization  theorem  shows 
that  there  aren’t  too  many  possibilities,  however.) 

4.  (d)  j (b  c d)  j (b  b c a d)  j (b  a b c d)  j (d). 

5.  The  number  of  occurrences  of  the  pair  ...xx...  is  equal  to  the  number  of  % 
columns,  minus  0 or  1.  When  x is  the  smallest  element,  the  numbers  of  occurrences 
are  equal  if  and  only  if  x is  not  first  in  the  permutation. 
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6.  Counting  the  associated  number  of  two-line  arrays  is  easy:  (Jf)  (£) . 

7.  Using  part  (a)  of  Theorem  B,  a derivation  like  that  of  (20)  gives 


A — 


A-  1 \ /B\  /C\  / B + k\  (C-k\ 

k — m — 1 J \m)  \B  — l ) \ l ) 


( A-  1 \ /B\  /C\  / B + k — 1\  (C-k\ 

\A  — k — m J \m/  \k  ) \ B — l — 1 ) \ l ) 

( A-  1 \ / B \ /C\  (B  + k-  1\  (C-k\ 

\A  — k — m)  \m/  \ B — l ) \ l ) 


8.  The  complete  factorization  into  primes  is  (d)j(b  c d)j(b)j(a  d b c)j(a  b)j(b  c d)j(d), 
which  is  unique  since  no  adjacent  pairs  commute.  So  there  are  eight  solutions,  with 
a = e,  (d),  (d)j(bcd), 

10.  False,  but  true  in  interesting  cases.  Given  any  linear  ordering  of  the  primes, 
there  is  at  least  one  factorization  of  the  stated  form,  since  whenever  the  condition  is 
violated  we  can  make  an  interchange  that  reduces  the  number  of  “inversions”  in  the 
factorization.  So  the  condition  fails  only  because  some  permutations  have  more  than 
one  such  factorization. 

Let  p ~ a mean  that  p commutes  with  a.  The  following  condition  is  necessary 
and  sufficient  for  the  uniqueness  of  the  factorization  as  stated: 


p ~ a ~ t and  p -<  a -<  r implies  p ~ r. 

Proof.  If  p ~ <3  ~ r and  p -<  cr  -<  r and  p rP  r,  we  would  have  two  factorizations 
crjTjp  = Tjpjcr;  hence  the  condition  is  necessary.  Conversely,  to  show  that  it  is  sufficient 
for  uniqueness,  let  pi  t • • • t Pn  = <?n  • • ■ t &n  be  two  distinct  factorizations  satisfying 
the  condition.  We  may  assume  that  <7i  pi,  and  hence  cr  1 = pk  for  some  k > 1; 
furthermore  <ti  ~ pj  for  1 < j < k.  Since  pk- 1 ~ 01  = pk,  we  have  pk- 1 -<  (Tj ; hence 
k > 2.  Let  j be  such  that  o\  -<  pj  and  pi  -<  cri  for  j < i < k.  Then  p3+i  ~ cri  ~ pj 
and  pj+i  -<  cri  -<  pj  implies  that  pj+i  ~ Pj\  hence  pj  -<  Pj+i,  a contradiction. 

Therefore  if  we  are  given  an  ordering  relation  on  a set  S of  primes,  satisfying  the 
condition  above,  and  if  we  know  that  all  prime  factors  of  a permutation  n belongs  to  S, 
we  can  conclude  that  7r  has  a unique  factorization  of  the  stated  type.  Such  a condition 
holds,  for  example,  when  S is  the  set  of  cycles  in  (29). 

But  the  set  of  all  primes  cannot  be  so  ordered.  For  if  we  have,  say,  (a  b)  < (d  e), 
then  we  are  forced  to  define 


(a  b)  -<  (d  e)  >-  (6  c)  -<  (e  a)  y (c  d)  -<  (a  b)  y (d  e), 
a contradiction.  (See  also  the  following  exercise.) 

11.  We  wish  to  show  that,  if  p(l) . . . p(t ) is  a permutation  of  {1, . . . , t},  the  permutation 
x'p(i)  . . . xpit)  is  topologically  sorted  if  and  only  if  we  have  crp(1)T-  • -j crp(t)  = our  • -t and 
p(i)  < p(j)  whenever  crp(i)  = ^p(j)  for  i < j.  We  also  want  to  show  that,  if  a:p(i)  . . . xp(t) 
and  xq(i)  ...xq(t)  are  distinct  topological  sortings,  we  have  crp(j)  / <xq(j)  for  some  j. 
The  first  property  follows  by  observing  that  xp(i)  can  be  first  in  a topological  sort  if  and 
only  if  (Tp(i)  commutes  with  (yet  is  distinct  from)  • • • , <7i;  and  this  condition 

implies  that  crp(2)  r ■ ■ • j <xP(«)  = <Xi  T • • • t <xp(i)-i  T <xp(i)+i  t • • • j at,  so  induction  can  be 
used.  The  second  property  follows  because  if  j is  minimal  with  p(j)  ^ q(j),  we  have, 
say,  p(j)  < q(j)  and  xp(j)  / xq(j)  by  definition  of  topological  sorting;  hence  ap(j)  has 
no  letters  in  common  with  aq(j). 
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To  get  an  arbitrary  partial  ordering,  let  the  cycle  cr*,  consist  of  all  ordered  pairs 
(i,j)  such  that  x * -<  Xj  and  either  i — k or  j = k\  these  ordered  pairs  are  to  appear 
in  some  arbitrary  order  as  individual  elements  of  the  cycle.  Thus  the  cycles  for  the 
partial  ordering  xx  < x2,  x3  -<  x4,  -<  x4  would  be  <Ti  = ((1, 2)(1, 4)),  <r2  = ((1,2)), 

a3  — ((3,4)),  <r4  = ((1, 4)(3, 4)). 

12.  No  other  cycles  can  be  formed,  since,  for  example,  the  original  permutation  con- 
tains no  “ columns.  If  (a  b c d)  occurs  s times,  then  (a  b)  must  occur  A — r — s 
times,  since  there  are  A — r columns  % , and  only  two  kinds  of  cycles  contribute  to  such 
columns. 

13.  In  the  two-line  notation,  first  place  A - t columns  of  the  form  f , then  put  the 
other  t a’s  in  the  second  line,  then  place  the  6’s,  and  finally  the  remaining  letters. 

14.  Since  the  elements  below  any  given  letter  in  the  two-line  notation  for  n~  are  in 
nondecreasing  order,  we  do  not  always  have  (7r~)_  = 7r;  but  it  is  true  that  ((7T-)-)-  = 
7r~.  In  fact,  the  identity 

(aT/?r  = ((«“  T P~)~)~ 

holds  for  all  a and  /3.  (See  exercise  5-2.) 

Given  a multiset  whose  distinct  letters  are  x\  < • • • < xm,  we  can  characterize  its 
self-inverse  permutations  by  observing  that  they  each  have  a unique  prime  factorization 
of  the  form  /8ij-  ■ -jftm,  where  fi-j  has  zero  or  more  prime  factors  (xj)j-  ■ ■j(x])j(xjXk1  )t 
• • - T (xjXkt),  j < kx  < ■ ■■  < kt.  For  example,  (o)r(a  b)j(a  b)j(b  c)j(c)  is  a self-inverse 
permutation.  The  number  of  self-inverse  permutations  of  {m  ■ a,  n ■ b}  is  therefore 
min(7n,  n)  + 1;  and  the  corresponding  number  for  {l  ■ a,  m ■ b,  n ■ c}  is  the  number  of 
solutions  of  the  inequalities  x -\ -y<l,  x + z<  m,  y + z < n in  nonnegative  integers 
x,  y,  z.  The  number  of  self-inverse  permutations  of  a set  is  considered  in  Section  5.1.4. 

The  number  of  permutations  of  {ni  ■ x\ ■ xm}  having  riij  occurrences  of 
**  in  their  two-line  notation  is  Ui  m'/Ui,  jUijl,  the  same  as  the  number  having  riij 
occurrences  of  in  the  two-line  notation.  Hence  there  ought  to  be  a better  way  to 
define  the  inverse  of  a multiset  permutation.  For  example,  if  the  prime  factorization 
of  7r  is  <Ti  j <t2  j • ■ ■ T (Tt  as  in  Theorem  C,  we  can  define  n~  = ert-  j • • • j t <rj~ , where 
(zi  . = (x„  . ..Xi). 

Dominique  Foata  and  Guo-Niu  Han  have  observed  that  it  would  be  even  more 
desirable  to  define  inverses  in  such  a way  that  n and  7r-  have  the  same  number  of 
inversions,  because  the  generating  function  for  inversions  given  the  numbers  riij  is 
rw./iL,  j n ij\z  times  a power  of  z;  see  exercise  16.  However,  there  does  not  seem 
to  be  any  natural  way  to  define  an  involution  having  that  property. 

15.  See  Theorem  2. 3. 4. 2D  and  Lemma  2.3.4.2E.  Removing  one  arc  of  the  directed 
graph  must  leave  an  oriented  tree. 

16.  If  xi  < x2  < the  inversion  table  entries  for  the  Xj’s  must  have  the  form 
bj i < •••  < bjnj  where  bjn ■ (the  number  of  inversions  of  the  rightmost  Xj)  is  at 
most  rij+ 1 + rij+2  + • • • - So  the  generating  function  for  the  jth  part  of  the  inversion 
table  is  the  generating  function  for  partitions  into  at  most  rij  parts,  no  part  exceeding 

rij+i  + rij+2  H ■ The  generating  function  for  partitions  into  at  most  m parts,  no  part 

exceeding  n,  is  the  z-nomial  coefficient  (m+n)2;  this  is  readily  proved  by  induction,  and 
it  can  also  be  proved  by  means  of  an  ingenious  construction  due  to  F.  Franklin  [Amer.  J. 
Mat h.  5 (1882),  268-269;  see  also  Polya  and  Alexanderson,  Elemente  der  Mathematik 
26  (1971),  102-109],  Multiplying  the  generating  functions  for  j = 1,  2,  ...  gives  the 
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desired  formula  for  inversions  of  multiset  permutations,  which  MacMahon  published  in 
Proc.  London  Math.  Soc.  (2)  15  (1916),  314-321. 

17.  Let  hn(z)  = (n\z)/nl;  then  the  desired  probability  generating  function  is 


g(z)  = hn{z)/hni(z)hn2{z) 

The  mean  of  hn(z)  is  5(2),  by  Eq.  5.1.1-(i2),  so  the  mean  of  g is 


2 


The  variance  is,  similarly, 

f^(n(n  - l)(2n  + 5)  - ni(ni  - l)(2ni  + 5)  - • • • ) 

= Mn  ~ni~n2 ) + £(n  - n\  - n| ). 

18.  Yes;  the  construction  of  exercise  5.1.1-25  can  be  extended  in  a straightforward 
way.  Alternatively  we  can  generalize  the  proof  following  5.1.1-(i4),  by  constructing 
a one-to-one  correspondence  between  m-tuples  ( qi,...,qm ) where  q3  is  a multiset 
containing  rij  nonnegative  integers,  on  the  one  hand,  and  ordered  pairs  of  n-tuples 
((01, . . . , a„),  (pi, . . . ,p„))  on  the  other  hand,  where  a\...an  is  a permutation  of 
{ni  • 1, . . . , nm  • m},  and  Pi  > • • • > pn  > 0.  This  correspondence  is  defined  as  before, 
giving  all  elements  of  qj  the  subscript  j;  it  satisfies  the  condition 


£(<?i)  H h S(9m)  = ind(ai . . . a„)  + (pi  H 1-  pn) 


where  E(gj)  denotes  the  sum  of  the  elements  of  q3 . [For  a further  generalization  of  the 
technique  used  in  this  proof  and  in  the  derivation  of  Eq.  5.1.3-(8),  see  D.  E.  Knuth, 
Math.  Comp.  24  (1970),  955-961.  See  also  the  comprehensive  treatment  by  Richard  P. 
Stanley  in  Memoirs  Amer.  Math.  Soc.  119  (1972).] 

19.  (a)  Let  S — {o  \ o is  prime,  (J  is  a left  factor  of  7r}.  If  S has  k elements,  the  left 
factors  A of  7r  such  that  p(A)  ^ 0 are  precisely  the  2k  intercalations  of  the  subsets  of  S 
(see  the  proof  of  Theorem  C);  hence  p(A)  = n^es  (l  + p(a))  = 0,  since  p(cr)  — — 1 
and  S is  nonempty,  (b)  Clearly  t(i\  . . .i„)  = p(n)  = 0 if  ij  = ik  for  some  j k. 
Otherwise  e(ti...tn)  = (— l)r  where  ii  . . ,in  has  r inversions;  this  is  (— l)s,  where 
i\ . . . in  has  s even  cycles;  and  this  is  (— l)n+t  where  i\  . . . irl  has  t cycles. 

20.  (a)  Obvious,  by  definition  of  intercalation,  (b)  By  definition, 


det(hij)  — 'y  ] c(zi . . . im ) hiq  . . . bmim  - 


Setting  bij  = 6tJ  — dijXj  and  applying  exercise  19(b),  we  obtain 

'y  ' 'y  ']  xii  ■ • . xin  p(xii  . . • xin  ) v(xi\  * - * xin  ) , 

n> 0 l<ii 

since  p(n)  is  usually  zero. 

(c)  Use  exercise  19(a)  to  show  that  D j G = 1 when  we  regard  the  products  of  x’s 
as  permutations  of  noncommutative  variables,  using  the  natural  algebraic  convention 
(a  + /?)  j 7 r = aT7r  + ^T7r. 

A succinct  rendition  of  this  combinatorial  proof  and  similar  proofs  of  other  impor- 
tant theorems  has  been  given  by  D.  Zeilberger,  Discrete  Math.  56  (1985),  61-72. 
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21.  nr=i  (nfe+  Vi£nfc_<i)>  we  nk>  = 0 for  k < 0,  since  there  are  (”m+  nt?m~d)  waYs 
to  insert  the  m’s  into  such  a permutation  of  {nj  • 1, . . . , nm-\  ■ (m  — 1)}. 

22.  (a)  The  left-right  reversal  of  Z(7r)  is  in  Po(0fclni  . . . tnt),  for  some  k;  but  instead  of 
reversing  l( 7r),  we  will  give  it  a two-line  form  by  placing  0 last  instead  of  first  in  the  top 
line.  The  number  k of  Os  in  1(tt)  and  r(n)  is  the  number  of  columns  { in  the  two-line 
form  of  7T  for  which  j < t < k;  this  is  also  the  number  of  columns  with  k < t < j. 
We  can  easily  reconstruct  n from  the  two-line  forms  of  l(n)  and  r(n),  because  each 
column  l with  j,k  < t occurs  in  Z(7r),  each  column  with  t < j,k  occurs  in  r( 7r),  and  the 


o or  k of  1(tt)  with 


k or 


of  r( 7r)  from 


remaining  columns  are  obtained  by  merging  30 
left  to  right. 

(b)  Let  i be  a permutation  of  the  stated  form,  and  let  a be  any  permutation  of 
Po(0n°lni  . . . m"m).  Construct  A as  follows:  Delete  the  first  no  entries  of  <r;  then  replace 
the  Os  by  x’s,  subscripted  with  the  first  no  entries  of  7r;  replace  the  other  elements  by  y’s, 
subscripted  with  the  remaining  nonzero  entries  of  7r.  Also  construct  p as  follows:  Delete 
the  Os  of  u,  and  replace  the  rij  occurrences  of  j with  x}  or  yj  according  as  the  columns 
1 °f  * have  k = 0 or  k # 0,  from  left  to  right.  For  example,  if  n = (23131302310102032010) 
and  a = (32313201103201300201)1  we  have  A = xiyiy'iX'sy\y\X\y2ysXzX\y2X-$y\  and  p = 
y3y2y-jXiX3X2yiyiy3y2yiXsX2X1 . Conversely,  we  can  reconstruct  n and  a from  A and  p. 

(c)  We  have  w(n)  = w(K^))  w(r(n))  in  the  construction  of  (a),  because  column  Jk 
of  7r  either  becomes  3k  of  weight  vjj/wk  in  l(n)  or  r(7r),  or  it  is  factored  into  columns 


and  k having  weights  Zj/zo  and  Zo/ zk.  If  l(n)  has  pj  columns  ° and  qj  columns  30, 


its  weight  is  rij=i(2 


qj  n. 


-pC 


) = Y\j=i(wi/zi)Pi  Now  nj=iHV2i)  qj 


is  the  complex  conjugate  of  Y\j=\(wj / zj)Qj 5 so  the  sum  of  weights  over  all  elements  of 
Po(0fclni  . . . tnt)  simplifies  to 


fc!  (m  -I 1 -nt  — k)\ 

ni! . . . nt\ 


E 

Pl  H hpt- 


Similar  remarks  apply  to  r(n).  The  stated  sum  is  positive  because  the  term  for  k = 0 
is  nonzero. 

23.  We  can  assume  that  the  original  strand  was  sorted.  Let  t = 2,  m = 4,  w\  = 
W3  = zi  — Z2  = +1,  w2  = W4  — Z3  = Z4  = —1  in  part  (c)  of  the  previous  exercise. 
Then  w(n)  = ( — l)d,  where  d is  the  number  of  columns  { with  j / k.  [See  Gillis 
and  Zeilberger,  European  J.  Comb.  4 (1983),  221-223.  This  result  was  first  proved 
in  a completely  different  way  by  Askey,  Ismail,  and  Koornwinder,  J.  Comb.  Theory 
A25  (1978),  277-287,  who  found  intriguing  connections  between  multiset  permutations 
and  integrals  of  products  of  the  Laguerre  polynomials  L“(x)  = J2k-0  (nt£)(~ x)k/k\.] 
The  analogous  result  for  a five-letter  alphabet  is  false,  because  the  5!  permutations  of 
{1, 2, 3, 4,  5}  include  1 + 10  + 45  with  an  even  number  of  differences,  0 + 20  + 44  with 
an  odd  number. 

24.  (a)  Transposing  " * twice  restores  “ xz.  Given  sort(  xy\  ;;;  *" ) = (*),  ;;;  ^ ),  unsort 

it  by  finding  the  leftmost  x in  the  top  row  and  transposing  it  to  the  left.  This  brings 
out  the  proper  y.  (The  value  of  sort(^?  ;;;  xy?  ) is  also  uniquely  determined.) 

(b)  We  are  essentially  expressing  the  two-line  notation  of  tv  in  the  form 

^ SOrt  ( 2/2  • • • 2?2n2  • • • Vt  • • • 3? tnt 

\xu  ...  2/1  X21  ...  V2  ...  Xti  ...  yt  ) ' 
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and  part  (a)  provides  us  with  precisely  the  tools  we  need.  [When  R preserves  certain 
statistics  of  the  two-line  notation,  this  construction  provides  combinatorial  proofs  of 
interesting  theorems.  See  Guo-Niu  Han,  Advances  in  Math.  105  (1994),  26-41.] 


SECTION  5.1.3 

1.  We  must  only  show  that  this  value  makes  (li)  valid  for  x = k,  when  k > 1. 
Using  (7),  the  formula  becomes 


*-=E(r:1)(‘+rr)=  £ «"(T)(”Tr) 


0 <j <r < fc 
k k — s 


e-’B-d'CTH 


n + 1 \ (n  + k — s — j 


8= 0 j= 0 


)■ 


For  s < k,  the  sum  on  j can  be  extended  to  the  range  0 < j < n + 1,  and  it  is  zero  (the 
(n  + l)st  difference  of  an  nth-degree  polynomial  in  j). 

2.  (a)  The  number  of  sequences  aia2  . . . a„  containing  each  of  the  elements  (1, 2, . . . , q) 
at  least  once  is  by  exercise  1.2.6-64;  the  number  of  such  sequences  satisfying 

the  analog  of  (10),  for  m = q,  is  since  we  must  choose  n — q of  the  possible  = 

signs,  (b)  Add  the  results  of  (a)  for  q = n — m and  q = n — m + 1. 


n 

result  is  (— l)n+1Bn+i2n+1(2n+1  — l)/(n+l).  Alternatively,  the  identity  2/(e-2:r  + l)  = 
1 + tanh  x lets  us  express  the  answer  as  (— l)^n_1^2Tn  when  n is  odd,  where  Tn  denotes 
the  tangent  number  defined  by  the  formula 

tan  z = Tiz  + T3z3/3!  + T5z5/ 5!  H . 


+1 


(—4a;)  (— 2x) 


- 1 e~2x  - 1 


by  (20),  hence  the 


When  n > 0 is  even,  the  sum  obviously  vanishes,  by  (7).  Incidentally,  (18)  now  yields 
the  curious  Stirling  number  identity  J 2k  {&}  k\/(—2)k  = 2Bn+i (1  — 2n+1)/(n  + 1). 

4.  (— l)n+m(^)-  (Consider  the  coefficient  of  zm+1  in  (18).) 

5.  (£)  = (k  + l)p  — kp  = (k  + 1)  — k = 1 (modulo  p)  for  0 < k < p,  by  formula  (13), 
exercise  1.2.6-10,  and  Theorem  1.2.4F. 

6.  Summing  first  on  k is  not  allowed,  because  the  terms  are  nonzero  for  arbitrarily 
large  j and  k,  and  the  sum  of  the  absolute  values  is  infinite. 

For  a simpler  example  of  the  fallacy,  let  ajk  = (fc  — j)  [|j  — k\  = l] . Then 

5Z  ( aJfc)  = £(«*>  = +1>  while  £ ( £ aik)  = £(— <5fc0)  = -1. 

j>0  Vfc>0  / j>0  k>0\j>0  / fc>0 

7.  Yes.  [F.  N.  David  and  D.  E.  Barton,  Combinatorial  Chance  (1962),  150-154;  see 
also  the  answer  to  exercise  25.] 

8.  [Combinatory  Analysis  1 (1915),  190.]  By  inclusion  and  exclusion.  For  example, 

l/(h  + 12)1/3!  (/ 4 + I5  + la)!  is  the  probability  that  Xi  < ■ ■ ■ < xi1+i2,  xi1+i2+1  < ■ ■ ■ < 
Ell+h+h’  an(l  •CJl"H2+l3  + l ^ ‘ ^ Xh+l2+l3+l4+l5+l6  * 

A simple  0(n 2)  algorithm  to  count  the  number  of  permutations  of  {l,...,n} 
having  respective  run  lengths  (li , . . . , Z/t)  has  been  given  by  N.  G.  de  Bruijn,  Nieuw 
Archief  voor  Wiskunde  (3)  18  (1970),  61-65. 
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9.  Pkm  = qkm  ~ qk(m+ 1)  in  (23).  Since  Y,k,m^m.zrnxk  = j^g(x,z)  and  0(2:,  0)  = 1, 
we  have 


h(z,  x)  = V hk(z)xk  = 7-^—g{x,  z)(\  - z *)  + X z 1 
z ' 1 — X 1 — X 


(1  — 2 k)x  Z kx 

e(x  — 1)2  — x 1 — X 


Thus  hi (2)  = e*  - (e*  - 1 )/Z;  h2(Z)  = (e2z  - 2ez)  + ez  - (e2z  - l)/z. 

10.  Let  Mn  = Li+-  ■ - + Ln  be  the  mean;  then  Mnxn  = h'{  1,  2),  where  the  derivative 
is  taken  with  respect  to  2,  and  this  is  x/(eI_1  — x)  — x/{\  — 2)  = M(x),  say.  By  the 
residue  theorem 


f M{z)z~n-1  dz  = Mn  - 2(n  + f ) + 1 + 

27TI  J J 2i  — 1 2i  — 1 

if  we  integrate  around  a circle  of  radius  r where  1 21 1 < r < 1 22 1 . (Note  the  double  pole  at 
2 = 1.)  Furthermore,  the  absolute  value  of  this  integral  is  less  than  f\M(z)  \ r_n_1  dz  = 
0(r~n).  Integrating  over  larger  and  larger  circles  gives  the  convergent  series  M„  = 
2n-S+£*>i2*(l/2?(l-2fc)). 

To  find  the  variance,  we  have  h"(\,x)  = —2h'(l,x)  — 2x(x  — l)ex~1/(ex~1  — a)2. 
An  argument  similar  to  that  used  for  the  mean,  this  time  with  a triple  pole,  shows 
that  the  coefficients  of  h"(l,x)  are  asymptotically  4n2  + | n — 2 M„  plus  smaller  terms; 
this  leads  to  the  asymptotic  formula  |n  + | (plus  exponentially  smaller  terms)  for  the 
variance. 


11.  Pkn  = £tl>i tk-i>iD(ti, . . . , ffc_i, n,  1),  where  D(h,l2,  is  MacMahon’s 

determinant  of  exercise  8.  Evaluating  this  determinant  by  its  first  row,  we  find  Pkn  = 
coP(fe_i)„  + c\P(k-2)n  H h Ck~2 Pin  — Ek(n),  where  Cj  and  Ek  are  defined  as  follows: 


ci  = (-1)3  J2 

n l+i^1 


1 

(ti  + ■ ■ • + tj+ 1)! 


<-'>'£(  7) 


m>  0 


1 

(m  + 1)! 


e o-jm 


1 


r,m>  0 


( m + 1)! 


-1  + e 


(a 


Ei(n)  = l/(n  + 1)!  - 1/n! ; E2(n)  = [n>  0 ]/(n  + 1)! ; 

«*>  *** 

m>0  v ' 

Let  Pon  = 0,  C (2)  = £cj2j  = (e1_I  — 1)/(1  — 2),  and  let 

_ V'  p /_!..»_*  _ ez2x2  - ex(l-x  + 2x)(2+a:-l)  - e*+x(l-2)2(l-a;)2 
E(z,x)  — 2_^Ek+i(n)z  x - _______ 


n,k 


The  recurrence  relation  we  have  derived  is  equivalent  to  the  formula  C(x)H(z,x)  = 
H(z,x)/x  + E(z,x)~,  hence  H(z,x)  = E(z,x)x(  1 — x)/(xe1~x  — 1).  Expanding  this 
power  series  gives  H\(z)  = hi (2)  (see  exercise  9);  H2(z)  = ehi(z)  + 1 — ez . 

[Note:  The  generating  functions  for  the  first  three  runs  were  derived  by  Knuth, 
CACM  6 (1963),  685-688.  Barton  and  Mallows,  Ann.  Math.  Statistics  36  (1965),  249, 
stated  the  formula  1 — Hn+i(z)  = (l  — H„(z))/(  1 — 2)  — Lnh\ (2)  for  n > 1,  together 
with  (25).  Another  way  to  attack  this  problem  is  illustrated  in  exercise  23.  Because 
adjacent  runs  are  not  independent,  there  is  no  simple  relation  between  the  problem 
solved  here  and  the  simpler  (probably  more  useful)  result  of  exercise  9.] 
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12.  [Combinatory  Analysis  1 (1915),  209-211.]  The  number  of  ways  to  put  the  multiset 
into  t distinguishable  boxes  is 


/ 1 H-  m - 1 \ / £ + n2  - l\ 

n + nm  - 1\ 

\ m J \ ri2  J 

V nm  ) 

since  there  are  (t+"*  ) ways  to  place  the  Is,  etc.  If  we  require  that  no  box  be  empty, 

the  method  of  inclusion  and  exclusion  tells  us  that  the  number  of  ways  is 


Mt  = Nt  — ( i ) Nt-i  + ( 2 ) — ■ 

Let  Pk  be  the  number  of  permutations  having  k runs;  if  we  put  k — 1 vertical  lines 
between  the  runs,  and  t — k additional  vertical  lines  in  any  of  the  n — k remaining  places, 
we  get  one  of  the  Mt  ways  to  divide  the  multiset  into  t nonempty  distinguishable  parts. 
Hence 

Mt  = Pt+{n-ti+1)pt.1+{n-t2  + 2)p^  + .... 

Equating  the  two  values  of  Mt  allows  us  to  determine  Pi,  P2,  ...  successively  in  terms 
of  ATj,  JV2, (A  more  direct  proof  would  be  desirable.) 

13.  1 + |13  x 3 = 20.5. 

14.  By  Foata’s  correspondence  the  given  permutation  corresponds  to 


(3 1)  T (1)  T ' ' ' T (4) 


/I  1 1 1 
V3  1 1 2 


222233334444 

343211342244 


by  (33)  this  corresponds  to 

(1  11122223333444  4\ 

\2  4 4 3 3 3 1 1442  12  1 2 3)’ 

which  corresponds  to  2342341421432131  with  9 runs. 

15.  The  number  of  alternating  runs  is  1 plus  the  number  of  j such  that  1 < j < n and 
either  a,-_  1 < a}  > aJ+1  or  Oj_  1 > a}  < a3+ 1.  For  fixed  j,  the  probability  is  §;  hence 
the  average,  for  n > 2,  is  1 + |(n  — 2). 

16.  Each  permutation  of  {1, 2, . . . , n — 1},  having  k alternating  runs,  yields  k permu- 
tations with  k such  runs,  2 with  k + 1,  and  n — k — 2 with  k + 2,  when  the  new  element 
n is  inserted  in  all  possible  places.  Hence 

It  is  convenient  to  let  f^J  — Sko,  Gi(z)  — 1.  Then 

Gn (z)  = ^((1  - z2)G'n_i{z)  + (2  + (n  - 2)z)G„-i(z)). 

Differentiation  leads  to  the  recurrence 

x„  — i((n  — 2)xn-i  + 2n  — 2) 
n 

for  xn  = G'n ( 1 ) , and  this  has  the  solution  xn  = |n—  | for  n > 2.  Another  differentiation 
leads  to  the  recurrence 

yn  = i((n  - 4)yn-i  + §n2  - + 6) 

for  yn  = G'n(  1).  Set  yn  = an 2 -b/Jn+y  and  solve  for  a , /3,  7 to  get  yn  = | n2  - ||n  + H 
for  n > 4.  Hence  var(gn)  = A(ifin  — 29),  n > 4. 
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These  formulas  for  the  mean  and  variance  are  due  to  J.  Bienayme,  who  stated 
them  without  proof  [Bull.  Soc.  Math,  de  France  2 (1874),  153-154;  Comptes  Rendus 
Acad.  Sci.  81  (Paris,  1875),  417-423,  see  also  Bertrand’s  remarks  on  p.  458].  The 
recurrence  relation  for  )f  £ \ is  due  to  D.  Andre  [Comptes  Rendus  Acad.  Sci.  97  (Paris, 
1883),  1356-1358;  Annales  Scientifiques  de  l’Ecole  Normale  Superieure  (3)  1 (1884), 
121-134].  Andre  noted  that  Gn(— 1)  = 0 for  n > 4;  thus,  the  number  of  permutations 
with  an  even  number  of  alternating  runs  is  n!/2.  He  also  proved  the  formula  for  the 
mean,  and  determined  the  number  of  permutations  that  have  the  maximum  number  of 
alternating  runs  (see  exercise  5.1.4-23).  It  can  be  shown  that 


Gn{z)  = 1(l  + ®)n+15n( 


l-iu\ 

1 + w )’ 


n>  2, 


where  gn{z)  is  the  generating  function  (18)  for  ascending  runs.  [See  David  and  Barton, 
Combinatorial  Chance  (London:  Griffin,  1962),  157-162.] 

17 • (2n»:-1i);  (21.-2) end  with  °>  (2fc-i) end  with  !• 

18.  (a)  Let  the  given  sequence  be  an  inversion  table  as  in  Section  5.1.1.  If  it  has 
k descents,  the  inverse  of  the  corresponding  permutation  has  k descents  (see  answer 
5.1.1-8(e));  hence  the  answer  is  (£).  (b)  This  quantity  satisfies  }{n,k)  = kf(n—l,k)  + 
(n—k  + l)f(n—l,  k—1),  so  it  must  be  (fc  ”j).  [See  D.  Dumont,  Duke  Math.  J.  41  (1974), 
313-315.] 

19.  (a)  (”),  by  the  correspondence  of  Theorem  5.1.2B.  (b)  There  are  (n  — k)\  ways  to 
put  n — k further  nonattacking  rooks  on  the  entire  board;  hence  the  answer  is  l/(n  — k)\ 
times  J2i>o  anj  ((.),  where  a„j  = (”)  by  part  (a).  This  comes  to  { n”fc  },  by  exercise  2. 

A direct  proof  of  this  result,  due  to  E.  A.  Bender,  associates  each  partition  of 
{1,2,  ...,n}  into  k nonempty  disjoint  subsets  with  an  arrangement  of  n — k rooks: 
Let  the  partition  be  {1, 2, . . . , n}  = [an,  ai2, . . . ,aini } U • • • U {a*,!, . . . ,aknk},  where 
atj  < for  1 < j < m,  1 < i < k.  The  corresponding  arrangement  puts  rooks  in 

column  of  row  at(J+1),  for  1 < j < m,  1 < i < k.  For  example,  the  configuration 
illustrated  in  Fig.  4 corresponds  to  the  partition  {1,3,8}  U {2}  U {4,6}  U {5}  U {7}. 

[Duke  Math.  J.  13  (1946),  259-268.  Sections  2.3  and  2.4  of  Richard  Stanley’s 
Enumerative  Combinatorics  1 (1986)  discuss  rook  placement  in  general.] 

20.  The  number  of  readings  is  the  number  of  runs  in  the  inverse  permutation.  The 
first  run  corresponds  to  the  first  reading,  etc. 

21.  It  has  n + 1 — k runs  and  requires  n + 1 — j readings. 

22.  [J.  Combinatorial  Theory  1 (1966),  350-374.]  If  rs  < n,  some  reading  will  pick 
up  t > r elements,  an  = j + 1,  . . . , a;,  = j + t,  where  i%  < ■ ■ ■ < it-  We  cannot  have 
om  > am+1  for  all  m in  the  range  ik  < m < ik+ 1,  so  the  permutation  contains  at  least 
t — 1 places  with  am  < am+i;  it  therefore  has  at  most  n — t + 1 runs. 

On  the  other  hand,  consider  the  permutation  ar  . . . a2  au,  where  block  aj  contains 
the  numbers  = j (modulo  r),  in  decreasing  order;  for  example,  when  n = 9 and  r — 4, 
this  permutation  is  847362951.  If  n > 2r  — 1,  this  permutation  has  r — 1 ascents,  so 
it  has  n + 1 — r runs.  Moreover,  it  requires  exactly  n + 1 — \n/r]  readings,  if  r > 1. 
We  can  rearrange  the  elements  of  {hr  + 1, . . . , kr  + r}  arbitrarily  without  changing  the 
number  of  runs,  thereby  reducing  the  number  of  readings  to  any  desired  value  > \n/f\. 

Now  suppose  rs  > n and  r + s < n + 1 and  r,  s > 1.  By  exercises  20  and  21  we  can 
assume  that  r < s,  since  the  reflection  of  the  inverse  of  a permutation  with  n + 1 — r 
runs  and  s readings  has  n + 1 — s runs  and  r readings.  Then  the  construction  in  the 
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preceding  paragraph  handles  all  cases  except  those  where  s > n + 1 — \n/r  \ and  r > 2. 
To  complete  the  proof  we  may  use  a permutation  of  the  form 


21  + 1 21-1  ...  1 n+2-r  n+l-r  ...  2k+2  2k  ...  2 n+3-r  ...  n-1  n, 
which  has  n + 1 — r runs  and  n + 1 — r — k readings,  for  0 < k < |(n  — r). 

23.  [SIAM  Review  3 (1967),  121-122.]  Assume  that  the  infinite  permutation  consists 
of  independent  samples  from  the  uniform  distribution.  Let  fk(x)  dx  be  the  probability 
that  the  &th  long  run  begins  with  x;  and  let  g(u,  x)  dx  be  the  probability  that  a long 
run  begins  with  x,  when  the  preceding  long  run  begins  with  u.  Then  fi(x)  = 1, 
fk+i(x)  = fg  fk(u)g(u,x)du.  We  have  g(u,x)  = Em>i  9m(u,x),  where 

gm  («,  x)  = Pr(w  < Xi  < ■ • ■ < Xm  > x or  u>  X i > • ••  > Xm  < x) 

= Pr(w  < Xi  < • • • < Xm)  + Pr(«  > Xi  > ■ ■ ■ > Xm) 

- Pr(u  < Xi  < ■ ■ ■ < Xm  < x)  - Pr(u  > Xi  > ■ ■ ■ > Xm  > x) 

= (um  + (1  — u)m  — | u — x|m)/m! ; 


hence  g(u,  x)  — e“+e1_u-l-el“_x|,  and  we  find  f2{x ) = 2e— l-ex— e1_x.  One  can  show 
that  fk(x ) approaches  the  limiting  value  (2cos(a:-|)-sin|-cos|)/(3sini-cos|).  The 
average  length  of  a run  starting  with  x is  ex  + e1-*  - 1;  hence  the  length  Ck  of  the  fcth 
long  run  is  fg  fk(x)(ex +e1~x -1)  dx;  £i  = 2e-3  « 2.43656;  C2  = 3e2-8e+2  « 2.42091. 
See  Section  5.4.1  for  similar  results. 

24.  Arguing  as  before,  the  result  is 

1+  J2  ^(p2  + q2)k(p2  + 2pq(2n~k~1  — 1 + q2 ((2pq)n~k~1  — l)/(2pq  — 1))); 

0<fc<7l 

carrying  out  the  sum  and  simplifying  yields 

2 n(p2  + q2)n(p(p-q)/{p2  + q2  -pq)  - 5)  + (2 pq)npq3/(p2  + q2)(p2  + q2  -pq) 

+ q2/  (p2  + q2)  + 2n_1. 


25.  Let  Vj  = (Ui  + • ■ • + Uj)  mod  1;  then  Vi, ...  ,Vn  are  independent  uniform  ran- 
dom numbers  in  [0..1),  forming  a permutation  that  has  k descents  if  and  only  if 

[Ui  b L'7„J  = k.  Hence  the  answer  is  (Jf)/n\,  a property  first  noticed  by  S.  Tanny 

[Duke  Math.  J.  40  (1973),  717-722];  see  also  W.  Meyer  and  R.  von  Randow,  Math. 
Annalen  193  (1971),  315-321. 

26.  For  example,  $5(1  — z)_1  = (z  + 26z2  + 66z3  + 26z4  + z5)/(  1 — z)6. 

27.  The  following  rule  defines  a one-to-one  correspondence  that  takes  a permutation 
ai  a2  . . . an  with  k descents  into  an  n-node  increasing  forest  with  k + 1 leaves:  The 
first  root  is  oi,  and  its  descendants  are  the  forest  corresponding  to  a2  ...  a*,,  where  k is 
minimal  such  that  ak+i  < «i  or  k = n.  [R.  P.  Stanley,  Enumerative  Combinatorics  1 
(Wadsworth,  1986),  Proposition  1.3.16.] 

28.  The  poles  of  L(z)  are  the  values  of  T(  1/e),  where  T{z)  is  the  (multivalued)  tree 
function  defined  by  T(z)  = zeTI'z> . Thus  for  m > 0 we  have  the  convergent  series 


+ E+rE(-i)‘ 


(lncr„ 


\n+l— k 


(n  + 1 — k)\ 


om  = — 1 — (2m  + 1)  7ri 


[Corless,  Gonnet,  Hare,  Jeffrey,  and  Knuth,  Advances  in  Computational  Mathematics 


5 (1996),  329-359,  formula  (4.18)];  in  particular,  we  have  z„ 
(i  - jT  ln(27rem))/m  + 0((log m)2/m2). 


(2m+  L )7ri+ln(27rem)  + 
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Let  P(z)  — X^m=o('z/(z  _ z m)  + z/(z  - zm)) ■ It  follows  that  P(x)  - P(—x)  = 
E™=0  4$t(xzm/{x2  - 4>))  = E“=i  °((a:  loS m)/(a:2  + m2))  = Em=i  0((*  loS ^O/z2)  + 
5Em=x+i  O((xlogm)/m2)  = 0(loga:)  for  x > 1.  But  we  know  that  L(x)  + P{x ) = cx 
for  some  c;  hence  2cx  = L(x)  — L(—x)  + 0(log  x),  and  by  letting  x — >•  oo  in  (25)  we  find 
c = —1/2.  Hence  L\  = EEm=o  cos^m  — 1/2.  (This  result  is  due  to  Svante  Janson.) 
29.  (a)  If  aj  . . . a„  has  2k  alternating  runs  and  k peaks,  (n+1— 01) . . . (n  + 1 — a„)  has 
k — 1 peaks.  (b,c)  See  L.  W.  Shapiro,  W.-J.  Woan,  and  S.  Getu,  SIAM  J.  Algebraic 
and  Discrete  Methods  4 (1983),  459-466. 

SECTION  5.1.4 


2.  When  pi  is  inserted  into  column  t,  let  the  element  in  column  t — 1 be  pj.  Then 

(qj,Pj)  is  in  class  t— 1,  qj  < qi,  and  pj  < pt\  so,  by  induction,  indices  exist 

with  the  property.  Conversely,  if  qj  < qi  and  pj  < pi  and  if  (qj,Pj)  is  in  class  t—  1,  then 
column  t — 1 contains  an  element  < pt  when  is  inserted,  so  ( qi,Pi ) is  in  class  > t. 

3.  The  columns  are  the  bumping  sequences  (9)  when  p,  is  inserted.  Lines  1 and  2 
reflect  the  operations  on  row  1,  see  (14).  If  we  remove  columns  in  which  line  2 has  00 
entries,  lines  0 and  2 constitute  the  bumped  array,  as  in  (15).  The  stated  method  for 
going  from  line  k to  line  k + 1 is  just  the  class-determination  algorithm  of  the  text. 

4.  (a)  Use  a case  analysis,  by  induction  on  the  size  of  the  tableau,  considering  first 
the  effect  on  row  1 and  then  the  effect  on  the  sequence  of  elements  bumped  from 
row  1.  (b)  Admissible  interchanges  can  simulate  the  operations  of  Algorithm  I,  with 
the  tableau  represented  as  a canonical  permutation  before  and  after  the  algorithm.  For 
example,  we  can  transform 

17  11  4 13  14  2 6 10  15  1 3 5 9 12  16  8 into  17  11  13  4 10  14  2 6 9 15  1 3 5 8 12  16 

by  a sequence  of  admissible  interchanges  (see  (4)  and  (5)). 

5.  Admissible  interchanges  are  symmetrical  between  left  and  right,  and  the  canonical 
permutation  for  P obviously  goes  into  PT  when  the  insertion  order  is  reversed. 

6.  Let  there  be  t classes  in  all;  exactly  k of  them  have  an  odd  number  of  elements, 
since  the  elements  of  a class  have  the  form 

(Pife>Pn)>  {Pik-liPi  2))  •••!  ( PinPik )■ 

(See  (18)  and  (22).)  The  bumped  two-line  array  has  exactly  t — k fixed  points,  because 
of  the  way  it  is  constructed;  hence  by  induction  the  tableau  minus  its  first  row  has  t — k 
columns  of  odd  length.  So  the  t elements  in  the  first  row  lead  to  k odd-length  columns 
in  the  whole  tableau. 

7.  The  number  of  columns,  namely  the  length  of  row  1,  is  the  number  of  classes 
(exercise  2).  The  number  of  rows  is  the  number  of  columns  of  PT,  so  exercise  5 (or 
Theorem  D)  completes  the  proof. 

8.  With  more  than  n 2 elements,  the  corresponding  P tableau  must  either  have  more 
than  n rows  or  more  than  n columns.  But  there  are  n x n tableaux.  [This  result  was 
originally  proved  in  Compositio  Math.  2 (1935),  463-470.] 
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9.  Such  permutations  Eire  in  1-1  correspondence  with  pairs  of  tableaux  of  shape 
(n,  n, . . . , n);  so  by  (34)  the  answer  is 

, 2 / 2i  \ 2 


i?\  A(2n-l,2n-2, . 
(2n  — 1)!  (2n  — 2)! . 


y = ( 

. n!  / \(2n  — 


l)(2n  — 2)2 


l(n  - l)" 


.l1 


12 


')■ 


the  total  distance  traveled. 


The  existence  of  such  a simple  formula  for  this  problem  is  truly  amazing.  We  can  also 
count  the  number  of  permutations  of  {1,2, ...  ,mn}  with  no  increasing  subsequences 
longer  than  m,  no  decreasing  subsequences  longer  than  n. 

10.  We  prove  inductively  that,  at  step  S3,  P(r-i)s  and  Pr(s-\)  are  both  less  than 
^(r+i )s  and  Pr(s+iy 

11.  We  also  need  to  know,  of  course,  the  element  that  was  originally  Pu.  Then  it  is 
possible  to  restore  things  using  an  algorithm  remarkably  similar  to  Algorithm  S. 

■ (n,2+1)  + (",2+2)  + -+(""2+")-(m31' 

The  minimum  is  the  sum  of  the  first  n terms  of  the  sequence  1,  2,  2,  3,  3,  3,  4,  4,  4,  4, 
5,  5,  5,  5,  5,  . . . of  exercise  1.2.4-41;  this  sum  is  approximately  ^/8/9  n3^2.  (Nearly 
all  tableaux  on  n elements  come  reasonably  close  to  this  lower  bound,  according  to 
exercise  29,  so  the  average  number  of  times  is  0(n3^2).) 

13.  Assume  that  the  elements  permuted  are  {1, 2, . . . , n},  so  that  a,  = 1;  and  assume 
that  a,j  = 2.  Case  1:  j < i.  Then  1 bumps  2,  so  row  1 of  the  tableau  corresponding  to 
a 1 . . . a,_i  a,+i  . . . an  is  row  1 of  Ps;  and  the  bumped  permutation  is  the  former  bumped 
permutation  except  for  its  smallest  element,  2,  so  we  may  use  induction  on  n.  Case  2: 
j > i.  Apply  Case  1 to  PT,  in  view  of  exercise  5 and  the  fact  that  (PT)S  = (PS)T. 

15.  As  in  (37),  the  example  permutation  corresponds  to  the  tableau 


1 

2 

5 

9 

11 

3 

6 

7 

4 

8 

10; 

hence  the  number  is  f(l,m,n)  = (l  + m + n)\  (l  — m + 1 )(l  — n + 2)(m  — n + 1)/ 
(l  + 2)!  (m  + 1)!  (n)!,  provided,  of  course,  that  l > m > n. 

16.  By  Theorem  H,  80080. 

17.  Since  g is  antisymmetric  in  the  x’s,  it  is  zero  when  x,  = x3,  so  it  is  divisible  by 
Xi  — Xj  for  all  i < j.  Hence  g(x  1, . . . , xn\y)  = h(x  1, . . . , y)A(xi, . . . , xn)-  Here  h 
must  be  homogeneous  in  x\ . ,xn,y,  of  total  degree  1,  and  symmetric  in  xi, . . . , x„; 
so  h(x  1, . . . , x„;y)  — a(x  1 + • • • + xn ) + by  for  some  a,  b depending  only  on  n.  We  can 
evaluate  a by  setting  y = 0;  we  can  evaluate  b by  taking  the  partial  derivative  with 
respect  to  y and  then  setting  y = 0.  We  have 

d 


~ A (xi,  . . . ) Xi~\~y , . . . ,Xn)|y=0 


A ( X 1 , . . . , Xn)  — A(xi , . . . , Xn  ) ^ ^ 


a3/-V-,  dx. 

Finally,  ^ ^(xi/(xi  - Xj))  = ^ ^(xi/(xi  - xj)+xj/(xj  - xt))  = 
i i j<i 

18.  It  must  be  A(xi, . . . ,x„)  • (60  +biy-\ h bmyrn),  where  each  bk  is  a homogeneous 

symmetric  polynomial  of  degree  m — k in  the  x’s.  We  have 
□fc 


dK 


k\  dyk 


A(xi,  . . . ,Xi  + y,  . . . ,Xn)|y=0  = A(xi , . . . , x„)  ^2  (lj  nf=i(*i  - X3l)) 
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summed  over  all  (”fc  1)  choices  of  distinct  indices  j i, . . . ,jk  5 At.  Now,  in  the  expression 
bk  = e x?/  nr=i  (x,  — Xjt ) , we  may  combine  those  groups  of  k + 1 terms  having  a given 
set  of  indices  ■ ■ ■ ,jk}',  for  example,  when  k = 2,  we  group  sets  of  three  terms  of 

the  form  om/(a  — b)(a  — c)  + bm/ ( b — a)(b  — c)  + cm/ (c  — a) (c  — b).  The  sum  of  every 
such  group  is  [ zm~k ] 1/(1  — xtz)(  1 — Xjxz) ...  (1  — Xjkz),  by  exercise  1.2.3-33.  We  find 
therefore  that 

*■-£(»;:£,)  e*. »>. 

where  s(p\, . . . ,pj)  is  the  monomial  symmetric  function  consisting  of  all  distinct  terms 
having  the  form  x f1  . . . x*’1,  for  distinct  indices  ii, . . . , ij  £ {1, . . . , n};  and  the  inner 
sum  is  over  all  partitions  of  m — k into  exactly  j parts,  namely  pi  > • • • > pj  > 1, 
Pi  + ■ ■ ■ + Pi  = m — k.  (This  result  was  obtained  jointly  with  E.  A.  Bender  in  1969.) 

When  m = 2 the  answer  is  (s(2)  + (n  — l)s(l)p  + (”)?/2)  A(zi, . . . ,x„);  for  m = 3 
we  get  (s(3)  + ((n  - l)s(2)  + s(l,l))y  + (n~1)s(l)y2  + (Jl)y3)  A(xi,...,x„);  etc. 

Another  expression  gives  bk  as  the  coefficient  of  zrn  in 

+ 1 ) e^fc+2 ) /(1  - + e2^2 


where  e;  = 53i<tl  <n  xh  ■ ■ ■ XU  's  an  elementary  symmetric  function.  Multiplying 
by  yk  and  summing  on  k gives  the  answer  as  the  coefficient  of  zm  in 


_1_ 

V* 


( (1  + z(y  - xi))  ■ ■ ■ (1  + z{y  - xn)) 
\ (1  - ZXi)  . . . (1  - zxn) 


..,Xn). 


19.  Let  the  shape  of  the  transposed  tableau  be  (n\,n 2, . . . , n'r);  the  answer  is 


■zf(ni,n2,...,nn 


n(n  — 1) 


+ 1), 


where  n = JZ  rij  = n'j  ■ (This  formula  can  be  expressed  in  a less  symmetrical  form 
using  the  relation  = |(n  + ri'2 ) . ) 

Note:  W.  Feit  [Proc.  Amer.  Math.  Soc.  4 (1953),  740-744]  showed  that  the  number 
of  ways  to  place  the  integers  {1, 2, . . . , n}  into  an  array  that  is  the  “difference”  of  two 
tableau  shapes  (ni,...,nm)  \ (h,  ■ . . ,lm),  where  0 < lj  < rij  and  n = — (j)>  is 

n!det(l/((rij  - j)  - (h  - i))!). 

20.  The  fallacious  argument  in  the  discussion  following  Theorem  H is  actually  valid 
for  this  case  (the  corresponding  probabilities  are  independent). 

Note:  If  we  consider  all  n!  ways  to  label  the  nodes,  the  labelings  considered  here 
are  those  having  no  “inversions.”  Inversions  in  permutations  are  the  same  as  inversions 
in  tree  labelings,  in  the  special  case  when  the  tree  is  simply  a path.  See  A.  Bjorner  and 
M.  L.  Wachs,  J.  Combinatorial  Theory  A52  (1989),  165-187. 

21.  [Michigan  Math.  J.  1 (1952),  81-88.]  Let  g(ni, . . . ,nm)  = («i  + •••  + nm)\ 
A(m, . . . ,nm)/m! . . .nm!  <r(ni, . . . ,nm),  where  cr(a:i, . . . ,xm)  = £[  1 <i<j<m(xi  + Xj). 
To  prove  that  g(ni, . . . ,nm)  is  the  number  of  ways  to  fill  the  shifted  tableau,  we 
must  prove  that  g(ni, . . . ,nm)  = g(ni~  1, . . . , nm)  + • • • + g(ni, . . . , nm  — 1).  The  iden- 
tity corresponding  to  exercise  17  is  XiA(xi  + y, . . . ,xn) /o(x  1 + y,...,x„)  + • • • + 
XnA[x  1 , . . . , Xn  "f  y)/(r(xi,  . . . ,Xn  4 J/)  — (^riT***-!-  Xn  ) A{x\ , . . . ,X„)/o(xi,  . . . , Xn  ) , 
independent  of  y ; for  if  we  calculate  the  derivative  as  in  exercise  17,  we  find  that 
2xiXj/(x 2 — x?)  + 2xjXi/(xi  — X2 ) = 0. 
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22.  Assume  that  m = N,  by  adding  Os  to  the  shape  if  necessary;  if  rn  > N and  nm  > 0, 
the  number  of  ways  is  clearly  zero.  When  m = N the  answer  is 


/ 

( n\  + m — 1 \ 

/ n2  + m — 2 N 

( nm  \\ 

< m — 1 / 

V m — 1 / 

\m—  l) 

( ri\  + m — 1 \ 

/ n2  + m — 2 \ 

( nm  \ 

V 

v 0 j 

( 0 J 

■ (o)  / 

Proof.  We  may  assume  that  nm  = 0,  for  if  nm  > 0,  the  first  nm  columns  of  the  array 
must  be  filled  with  i in  row  i,  and  we  may  consider  the  remaining  shape  (m  — nm, . . . , 
rim-rim).  By  induction  on  m,  the  number  of  ways  is 


/ f ki  + m - 2\ 

V m — 2 ) 

/ k2  + m — 3 \ 

V m — 2 J 

\m  — 2 / 

J2  det 

ri2  <fci  <n  i 

y(h+r2) 

( k2  + m — 3 \ 

f km—  1 \ 

l 0 j' 

nm<km  — 1 <nm  — 1 

V 0 ) 

where  rij  — kj  represents  the  number  of  m’s  in  row  j.  The  sum  on  each  kj  may  be 
carried  out  independently,  giving 

/Um  — lTl-X  / Tim  \ ^ 

' m— 1 / \m— 1/ 

/^■m  — 1 “ f- /%\ 

V i )-{  i ) J 

which  is  the  desired  answer  since  nm  = 0.  The  answer  can  be  converted  into  a 
Vandermonde  determinant  by  row  operations,  giving  A(ni  + m — 1,  n2+m-2, . . . , nm)/ 
(m  — 1)!  (m  — 2)! . . . 0!.  [The  answer  to  this  exercise,  in  connection  with  an  equivalent 
problem  in  group  theory,  appears  in  D.  E.  Littlewood’s  Theory  of  Group  Characters 
(Oxford,  1940),  189.] 

23.  [Journal  de  Math.  (3)  7 (1881),  167-184.]  (This  is  a special  case  of  exercise  5. 1.3-8, 
with  all  runs  of  length  2 except  that  the  final  run  might  have  length  1.)  When  n > 2, 
element  n must  appear  in  one  of  the  rightmost  positions  of  a row;  once  it  has  been 
placed  in  the  rightmost  box  on  row  k from  the  bottom,  we  have  E2k-iEn-2k 

ways  to  complete  the  job.  Let 


det 


/ni+m— 1\  / r»2+m— 2 \ /n2+m-2\  /n3+m-3\ 

V m— 1 J V m— 1 / V m— 1 / V m— 1 / 

^n2+m- 2^  ^n3+m-3^ 


then 


h(z)  = E2n-1z2n-1/(2n  - 1)!  = | (g(z)  - g(-z))-, 

n>  1 


h(z)g{z)  — ^2  ( 21;  ^]E2k-iEn-2k+\zn/n\  = f ^ En+izn/n\  \ - l = g'(z)  — 1. 


fc,n>  1 


Replace  z by  -z  and  add,  obtaining  h(z)2  = h'(z)  - 1;  hence  h(z)  = tanz.  Setting 
k(z)  = g(z)  - h(z),  we  have  h(z)k(z)  = fc'(z);  hence  k(z)  = sec z and  g(z ) = 
secz  + tanz  = tan( | z + \tt).  The  coefficients  Er  are  called  Euler  numbers',  with 
odd  index,  E2„-i  is  the  tangent  number  T2n_i  = (-l)n_14n(4n  - l)B2n/(2n).  Tables 
of  these  numbers  appear  in  Math.  Comp.  21  (1967),  663-688;  the  sequence  begins 
(Eo,  Ei,  E2, . . . ) = (1, 1, 1, 2, 5, 16, 61, 272, 1385,  7936, . . . ).  The  easiest  way  to  compute 
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Euler  numbers  is  probably  to  form  the  triangular  array 

1 

0 1 
1 1 0 

0 12  2 

5 5 4 2 0 

0 5 10  14  16  16 

61  61  56  46  32  16  0 


in  which  partial  sums  are  alternately  formed  from  left  to  right  and  right  to  left  [L.  Seidel, 
Sitzungsberichte  math.-phys.  Classe  Akademie  Wissen.  Miinchen  7 (1877),  157-187]. 

25.  In  general,  if  u„k  is  the  number  of  permutations  on  {1,2, . . . ,n}  having  no  cycles 
of  length  > k,  ^2  unkZn/n\  = exp(z  + z2/ 2 + • • • + zk/k );  this  is  proved  by  multiplying 
exp(z)  x • • • x exp (zk/k),  obtaining 


£ 


J1+2J2H hk)k=n 


Ijiji!  2 ja ! 


see  also  exercise  1.3.3-21.  Similarly,  exp(J3sgs  zs/s)  is  the  corresponding  generating 
function  for  permutations  whose  cycle  lengths  are  all  members  of  a given  set  S. 

26.  The  integral  from  0 to  oo  is  n(t+1^4T((f  4-  l)/2)/2*t+3^2,  by  the  gamma  function 
integral  (exercise  1.2.5-20,  t = 2 x2/i/n).  So,  from  — oo  to  oo,  we  get  0 when  t is  odd, 
otherwise  nSt+1^4y/rt  t\/2(-3t+1^2(t/2)\. 

27.  (a)  If  Ti  < Tj+i  and  c,  < ct+i , the  condition  i < Qrici+i  < i + 1 is  impossible. 
If  ri  > rt+i  and  c,  > c,+i,  we  certainly  cannot  have  i + 1 < Qr,ct+1  < i.  (b)  Prove, 
by  induction  on  the  number  of  rows  in  the  tableau  for  ai  ..  .at,  that  a,  < a,+i  implies 
d < Cj+i,  and  a i > a;+i  implies  c,  > Cj+i.  (Consider  row  1 and  the  “bumped” 
sequences.)  (c)  This  follows  from  Theorem  D(c). 

28.  This  result  is  due  to  A.  M.  Vershik  and  S.  V.  Kerov,  Dokl.  Akad.  Nauk  SSSR 
233  (1977),  1024-1028;  see  also  B.  F.  Logan  and  L.  A.  Shepp,  Advances  in  Math.  26 
(1977),  206-222.  [J.  Baik,  P.  Deift,  and  K.  Johansson,  J.  Amer.  Math.  Soc.  12  (1999), 
1119-1178,  showed  that  the  standard  deviation  is  0(n1/'6);  moreover,  the  probability 
that  the  length  is  less  than  2 y/n  + fn1/6  approaches  exp(—  ff°(x  — t)u2(x)  dx ),  where 
u" (x)  = 2u3(x)  + xu(x)  and  u(x)  is  asymptotic  to  the  Airy  function  Ai(x)  as  x — > oo.] 

29.  (7)//!  is  the  average  number  of  increasing  subsequences  of  length  l.  (By  exercises 
8 and  29,  the  probability  is  0(1/ y/n)  that  the  largest  increasing  sequence  has  length 
> e\Jn  or  < s/n/e.)  [J.  D.  Dixon,  Discrete  Math.  12  (1975),  139-142.] 

30.  [Discrete  Math.  2 (1972),  73-94;  a simplified  proof  has  been  given  by  Marc  van 
Leeuwen,  Electronic  J.  Combinatorics  3,2  (1996),  paper  #R15.] 

31.  xn  = a[n/2J  where  ao  = 1,  ai  =r  2,  a„  = 2an_j  + (2n  — 2)a„_2!  J2anZn/n\  = 
exp(2z  + z2)  = (Y2  t„z"/n!)2;  xn  ~ exp(jnlnn—  \n  + y/n—  | — | In  2)  for  n even.  [See 
E.  Lucas,  Theorie  des  Nombres  (1891),  217-223.] 

32.  Let  mn  = f2°ootne~(t~1^2dt/\/2n.  Then  mo  = mi  = 1,  and  m„+i  — mn  = nmn-\ 
if  we  integrate  by  parts.  So  mn  = tn  by  (40). 

33.  True;  it  is  det  Y!]=i  (*-1)-  [Mitchell,  in  Amer.  J.  Math.  4 (1881),  341-344,  showed 
that  it  is  the  number  of  terms  in  the  expansion  of  a certain  symmetric  function,  now 
called  a Schur  function.  Indeed,  if  0 < ai  < • • • < am,  it  is  the  number  of  terms  in 

Rni  n2...nm  (xi,X2,  . . . ,iTm)  where  Til  — Um  III.  Tl2  — Q.m  — 1 {tTI  1),  . . . , Tim  — Ul  1. 
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This  Schur  function  is  the  sum  over  all  generalized  tableaux  of  shape  (ni,...,nm) 
with  elements  in  (1,  ...,m)  of  the  products  of  Xj  for  all  j in  the  tableau,  where  a 
generalized  tableau  is  like  an  ordinary  tableau  except  that  equal  elements  are  allowed 
in  the  rows.  In  this  definition  we  allow  the  parameters  nk  to  be  zero.  For  example, 
S2io(a:i, x2,x3)  = x\x2  + xfx3  + xxx$  +x1x2x3  +x1x2x3  + xxx3  + x\x3  + x2x\,  because 
of  the  generalized  tableaux  J1,  J1,  J,2,  J2,  *3,  J3,  22,  23.  The  number  of  such  tableaux  is 
A(l,3, 5)/A(l,2,3)  = 8.  By  extending  Algorithms  I and  D to  generalized  tableaux 
[Pacific  J.  Math.  34  (1970),  709-727],  we  can  obtain  combinatorial  proofs  of  the 
remarkable  identities 


m n ^ 

^ • • • » *^m)  (2/l ) • • • ) Vn)  = ~ , 

A i=lj=l  XiVj 

m n 

S\(x!, . . . ,xm)SxT(yi, . . . ,y„)  = fj(l  + *<%•); 


i=l  j= 1 


here  the  sum  is  over  all  possible  shapes  A,  and  \T  denotes  the  transposed  shape.  These 
identities  were  first  discovered  by  D.  E.  Littlewood,  Proc.  London  Math.  Soc.  (2)  40 
(1936),  40-70,  Theorem  V.] 

Notes:  It  follows,  for  example,  that  any  product  of  consecutive  binomial  coef- 
ficients (“)  (“I1)  . . . (a^()  is  divisible  by  (£)  (fc^1)  . . . (k£‘),  since  the  ratio  is  A(a  + l, 
• . . ,a  + l,o,  k — 1, . . . , 1, 0)/A(fc  + /,...,  1,0).  The  value  of  A (k, . . . , 1,0)  = k\ . . . 2!  1! 
is  sometimes  called  a “superfactorial.” 

34.  The  length  of  a hook  is  also  the  length  of  any  zigzag  path  from  the  hook’s  bottom 
left  cell  (i,j)  to  its  top  right  cell  ( i',j ').  We  prove  a stronger  result:  If  there  is  a hook 
of  length  a + b,  then  there  is  either  a hook  of  length  a or  a hook  of  length  b.  Consider 
the  cells  (i,j)  = (ii,ji),  (*2,^2),  • • • , (ia+i>,  ja+b)  = (i1  ,j')  that  hug  the  bottom  of  the 
shape.  If  ja+ 1 = ja,  the  cell  (ia,j  1)  has  a hook  of  length  a;  otherwise  (ia+b,ja+ 1) 
has  a hook  of  length  b.  [ Reference : Japanese  J.  Math.  17  (1940),  165-184,  411-423. 
Nakayama  was  the  first  to  consider  hooks  in  the  study  of  permutation  groups,  and  he 
came  close  to  discovering  Theorem  H.] 

35.  The  execution  of  steps  G3-G5  decreases  exactly  elements  of  the  p array  by  1 
when  qtj  is  increased,  because  the  algorithm  follows  a zigzag  path  from  pn/j  to  piUi  ■ 
The  next  execution  of  those  steps  either  starts  with  a larger  value  of  j or  stays  above 
or  equal  to  the  preceding  zigzag.  Therefore  the  q array  is  filled  from  left  to  right  and 
bottom  to  top;  to  reverse  the  process  we  proceed  from  right  to  left  and  top  to  bottom: 

HI.  [Initialize.]  Set  pij  0 for  1 < J < n,  and  1 < i < n\.  Then  set  i «-  1 and 

j «-  «!• 

H2.  [Find  nonzero  cell.]  If  qtj  > 0,  go  on  to  step  H3.  Otherwise  if  i < n'j,  increase 
i by  1 and  repeat  this  step.  Otherwise  if  j > 1,  decrease  j by  1,  set  i •<—  1, 
and  repeat  this  step.  Otherwise  stop  (the  q array  is  now  zero). 

H3.  [Decrease  q,  prepare  for  zigzag.]  Decrease  qij  by  1 and  set  l 4-  i,  k m. 
H4.  [Increase  p.]  Increase  pik  by  1. 

H5.  [Move  down  or  left.]  If  l < n'k  and  pik  > P(i+i)k,  increase  l by  1 and  return 
to  H4.  Otherwise  if  k > j,  decrease  A:  by  1 and  return  to  H4.  Otherwise 
return  to  H2.  | 

The  first  zigzag  path  for  a given  column  j ends  by  incrementing  pn'j,  because  pij  < 
• ■ ■ < pnp  implies  that  pniJ  > 0.  Each  subsequent  path  for  column  j stays  below  or 
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equal  to  the  previous  one,  so  it  also  ends  at  The  inequalities  encountered  on  the 

iJ 

way  show  that  this  algorithm  inverts  the  other.  [J.  Combinatorial  Theory  A21  (1976), 
216-221.] 

36.  (a)  The  stated  coefficient  of  zm  is  the  number  of  solutions  tom  = )]  htJ  qtJ , so  we 
can  apply  the  result  of  the  previous  exercise,  (b)  If  oi , . . . , a*  are  any  positive  integers, 
we  can  prove  by  induction  on  k that 

[zm]  1/(1  - z)(  1 - . . . (1  - z“*)  = (7)/oi  • • ■ ak  + 0(mk~1) . 

The  number  of  partitions  of  m with  at  most  n parts  is  therefore  (ijm1)  /n!  + 0(mn~2) 
for  fixed  n,  by  exercise  5.1.1-15.  This  is  also  the  asymptotic  number  of  partitions 

m = pi  H b pn  with  distinct  parts  pi  >••  • > > 0 (see  exercise  5.1.1-16).  So  the 

number  of  reverse  plane  partitions  is  asymptotically  N(rfl1) /n!  + 0(mn~2)  when  there 
are  N tableaux  of  a given  n-cell  shape.  By  part  (a)  this  is  also  ( ™x) / ]/[  hij+0(mn~2). 
[Studies  in  Applied  Math.  50  (1971),  167-188,  259-279.] 

37.  Plane  partitions  in  a rectangle  are  equivalent  to  reverse  plane  partitions,  so  the 
hook  lengths  tell  us  the  generating  function  1/ 111=1  II^=i (-*-  ~ zt+:,_1)  in  an  r x c 
rectangle.  Letting  r,c-4  oo  yields  the  elegant  answer  1/(1  — z)(l  — z2)2(l  — z3)3 
[MacMahon’s  original  derivation  in  Philosophical  Transactions  A211  (1912),  75-110, 
345-373,  was  extremely  complicated.  The  first  reasonably  simple  proof  was  found  by 
Leonard  Carlitz,  Acta  Arithmetica  13  (1967),  29-47.] 

38.  (a)  The  probability  is  1/n  when  k = l = 1;  otherwise  it  is 


nP(I  \ {ip}, 


n dioio 


by  induction  on  k + l. 

(b)  Summing  over  all  I and  J gives 


n *(1  + d!b)  . . . (1  + d(a-l)b)  (1  + dal)  • • • (1  + do(6-l))  ) 


which  is  easily  seen  to  equal  f(T  \ {(a,  6)}) / f(T). 

(c)  The  sum  over  all  corners  yields  1,  because  every  path  ends  at  a corner. 
Therefore  ^2  f(T  \ {(a,  6)})  = /(T),  and  this  proves  Theorem  H by  induction  on  n. 
Furthermore,  if  we  put  n into  the  corner  cell  at  the  end  of  the  random  path  and  repeat 
the  process  on  the  remaining  n — 1 cells,  we  get  each  tableau  with  probability  1//(T). 
[Advances  in  Math.  31  (1979),  104-109.] 


39.  (a)  Qn  . . . Qin  will  be  b\  . . .bn,  the  inversion  table  of  the  original  permutation 
Pn  ■ ■ ■ Pin ■ (See  Section  5.1.1.) 

(b)  Qn  . . . Qni  is  the  negated  inversion  table  (— Ci) . . . (—Cn)  of  exercise  5. 1.1-7. 

(c)  This  condition  is  clearly  preserved  by  step  P3. 

(d)  (23)  ((2!)’  (0  (12)  -»•  ((34)’  (0  o1))-  This  example  shows  that  we 

cannot  run  step  P3  backwards  without  looking  at  the  array  P. 


12 

10 

8 

14 

15 

11 

9 

13 

7 

1 

6 

4 

5 

16 

3 

2 
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(f)  The  following  algorithm  is  correct,  but  not  obviously  so. 

Ql.  [Loop  on  (i,j).]  Perform  steps  Q2  and  Q3  for  all  cells  (i,j)  of  the  array  in 
lexicographic  order  (that  is,  from  top  to  bottom,  and  from  left  to  right  in 
each  row);  then  stop. 

Q2.  [Adjust  Q .]  Find  the  “first  candidate”  (r,  s)  by  the  rule  below.  Then  set 
Qi(k+ 1)  t-  Qik  - 1 for  j <k  < s. 

Q3.  [Unfix  P at  (i,j).]  Set  K <—  Pra.  Then  do  the  following  operations  until 
(r,s)  = (i,j):  If  P(r_i)s  > Pr(3_x),  set  Prs  *-  P(r- i)a  and  r <—  r — 1;  otherwise 
set  Prs  t—  Pr(s- 1)  and  s «—  s — 1.  Finally  set  Pij  t—  K.  | 

In  step  Q2,  cell  (r,  s)  is  a candidate  when  s > j and  Qls  < 0 and  r = i — Qis.  Let  T 
be  the  oriented  tree  of  the  hint.  One  of  the  basic  invariants  of  Algorithm  Q is  that  there 
will  be  a path  from  ( r,s ) to  (i.j)  in  T whenever  (r,  s)  is  a candidate  in  step  Q2.  The 
reverse  of  that  path  can  be  encoded  by  a sequence  of  letters  D,  Q,  and  R,  meaning  that 
we  start  at  (i,j),  then  go  down  (D)  or  to  the  right  (R)  or  quit  (Q).  The  first  candidate 
is  the  one  whose  code  is  lexicographically  first  in  alphabetic  order;  intuitively,  it  is  the 
candidate  with  the  “leftmost  and  bottommost”  path. 

For  example,  the  candidates  when  ( i,j ) = (1,1)  in  the  example  of  part  (e)  are 
(3,1),  (4,2),  (2,3),  (2,4),  and  (1,6).  Their  respective  codes  are  DDQ,  DDDRQ,  RDRQ, 
RDRRQ,  and  RRRRRQ;  so  the  first  is  (4,2). 

Algorithm  P is  a slightly  simplified  version  of  a construction  stated  without  proof  in 
Funkts.  Analiz  i Ego  Priloz.  26, 3 (1992),  80-82.  The  proof  of  correctness  is  nontrivial; 
a proof  was  given  by  J.-C.  Novelli,  I.  Pak,  and  A.  V.  Stoyanovskii  in  Disc.  Math,  and 
Theoretical  Comp.  Sci.  1 (1997),  53-67. 

40.  An  equivalent  process  was  analyzed  by  H.  Rost,  Zeitschrift  fur  Wahrscheinlich- 
keitstheorie  und  verwandte  Gebiete  58  (1981),  41-53. 

41.  (Solution  by  R.  W.  Floyd.)  A deletion-insertion  operation  essentially  moves  only  a,. 
In  a sequence  of  such  operations,  unmoved  elements  retain  their  relative  order.  There- 
fore if  7T  can  be  sorted  with  k deletion-insertions,  it  has  an  increasing  subsequence 
of  length  n — k;  and  conversely.  Hence  dis(7r)  = n — (length  of  longest  increasing 
subsequence  of  n)  — n — (length  of  row  1 in  Theorem  A). 

M.  L.  Fredman  has  proved  that  the  minimum  number  of  comparisons  needed  to 
compute  this  length  is  nigra  - nlglgn  + 0(n)  [Discrete  Math.  11  (1975),  29-35]. 

42.  Construct  a multigraph  that  has  vertices  {Ojj,  1l,  1^, . . . , til,  nR,  (n  + 1)L}  and 
edges  ky{  — (k  + 1)l  for  0 < k < n;  also  include  the  edges  Or  — — 7 r,  7 l — 1L , 
lj?  — 2 l,  2 r — 4 l,  4 r — 5 l,  5 r — 3l,  3 r — 6 r,  6 l — 81,,  which  define  the 
“bonds”  of  Lobelia  fervens.  Exactly  two  edges  touch  each  vertex,  so  the  connected 
components  are  cycles:  ( Or  lL  7 l Or  3 r 4l  2 R 3L  5 r 0l  8l  7r)(1r  2l)(4 r 5l).  Any  flip 
operation  changes  the  number  of  cycles  by  -1,  0,  or  +1.  Therefore  we  need  at  least  five 
flips  to  reach  the  eight  cycles  (Or  1l)(1r  2l)  ■ ■ ■ (7 r 8l).  [J.  Kececioglu  and  D.  Sankoff, 
Algorithmic  a 13  (1995),  180-210.] 

The  first  flip  must  break  the  bond  6 l — 8 l,  because  we  get  no  new  cycle  when  we 
break  two  bonds  that  have  the  same  left-to-right  orientation  in  the  linear  arrangement. 
This  leaves  five  possibilities  after  one  flip,  namely  g?g6g3  gsg?g?g? , ff?flig2S4fl,sS356, 
9 7 gi92g<i9s  gig'i  , g? , and  gegsgsgfg^gigT,  four  more  flips  suffice  to  sort 
all  but  the  second  of  these. 

Incidentally,  there  are  27  • 7!  = 645120  different  possible  arrangements  of  g-i  . . ,g7, 
and  179904  of  them  are  at  distance  < 5 from  tobacco  order. 
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[An  efficient  algorithm  to  find  the  best  way  to  sort  any  signed  permutation  by 
reversals  was  first  developed  by  S.  Hannenhalli  and  P.  Pevzner,  JACM  46  (1999), 

1- 27.  Improvements  that  solve  the  problem  in  0(n15\/logn)  time  were  subsequently 
found  by  H.  Kaplan  and  E.  Verbin,  J.  Comp.  Syst.  Sci.  70  (2005),  321-341;  E.  Tannier, 
A.  Bergeron,  and  M.-F.  Sagot,  Discrete  Applied  Math.  155  (2007),  881-888.] 

43.  Denote  an  arrangement  like  57tS,i<72ff4fl,593<76J  by  the  signed  permutation  7124536.  If 
there  is  a negated  element,  say  k is  present  but  not  k — 1,  one  hip  will  create  the  2-cycle 
{(k  — 1)r  ki,).  Similarly,  if  k is  present  but  not  k + 1,  a single  hip  creates  (kn  (k +!)[.). 
And  if  all  hips  of  that  special  kind  remove  all  negated  elements,  a single  hip  creates  two 

2- cycles.  If  no  negated  elements  are  present  and  the  permutation  isn’t  sorted,  some 
hip  will  preserve  the  number  of  cycles.  Hence  we  can  sort  in  < n hips  if  the  given 
permutation  has  a negated  element,  < n + 1 otherwise. 

When  n is  even,  the  permutation  n (n  — 1)  ...  1 requires  n + 1 hips,  because  it  has 
one  cycle  after  the  first  flip.  When  n > 3 is  odd,  the  permutation  2 1 3n  (n  — 1)  ...  4 
requires  n + 1 by  a similar  argument. 

44.  Let  Ck  be  the  number  of  cycles  of  length  2k  in  the  multigraph  of  the  previous 
answers.  An  upper  bound  on  the  average  value  of  Ck  can  be  found  as  follows:  The  total 
number  of  potential  2A:-cycles  is  2k  (n  + l)—/(2k),  because  we  can  choose  a sequence  of  k 
distinct  edges  from  {Or  — - 1 l,  . . . ,riR  — (n  + 1)l}  in  (n-l- 1)-  ways  and  orient  them  in 
2k  ways;  this  counts  each  cycle  2k  times,  including  impossible  cases  like  (1r  2l  2r  3l) 
or  (1r  2 l 3 l 2 r 3r  4 l)  or  (1r  2r  6 r 7 l 4 l 3 r 2r  3 l 6 l 5r).  When  k < n,  every  possible 
2/c-cycle  occurs  in  exactly  2n~h(n  — k)\  signed  permutations.  For  example,  consider 
the  case  k = 5,  n — 9,  and  the  cycle  (Or  1r  9l  Sr  7r  8l  1r  2l  5l  4r).  This  cycle 
occurs  in  the  multigraph  if  and  only  if  the  signed  permutation  begins  with  4 and 
contains  the  substrings  9187  and  25  or  their  reverses;  we  obtain  all  solutions  by  finding 
all  signed  permutations  of  (1,2, 3,6}  and  replacing  1 by  9187,  2 by  25.  Therefore 
E Ck  < l/(2fc)  2fc(n  + l)-2n-fe(n  — k)\/2nn\  = \(l/k  + l/(n  + 1 - k)).  It  follows  that 
Ec  = J2k=i  Eft  + Ecn+i  < Hn  -)-  1.  Since  n + 1 — c is  a lower  bound  on  the  number 
of  flips,  we  need  >n  + l-  Ec>n  - Hn  of  them. 

[This  proof  uses  ideas  of  V.  Bafna  and  P.  Pevzner,  SICOMP  25  (1996),  272-289, 
who  studied  the  more  difficult  problem  of  sorting  unsigned  permutations  by  reversals. 
In  that  problem,  an  interesting  permutation  that  can  be  written  as  the  product  of 
non-disjoint  cycles  (12  3)(3  4 5)(5  6 7)  ...,  ending  with  either  (n—  In)  or  (n-2  n-l  n) 
depending  on  whether  n is  even  or  odd,  turns  out  to  be  the  hardest  to  sort.] 

SECTION  5.2 

1.  Yes;  i and  j may  run  through  the  set  of  values  1 < j < i < N in  any  order, 
possibly  in  parallel  and/or  as  records  are  being  read  in. 

2.  The  sorting  is  stable  in  the  sense  defined  at  the  beginning  of  this  chapter,  because 
the  algorithm  is  essentially  sorting  by  lexicographic  order  on  the  distinct  key-pairs 
{K\,  1),  (K2, 2), . . . , (Kn,  N).  (If  we  think  of  each  key  as  extended  on  the  right  by  its 
location  in  the  file,  no  equal  keys  are  present,  and  the  sorting  is  stable.) 

3.  It  would  sort,  but  not  in  a stable  manner;  if  Kj  = Ki  and  j < i,  Rj  will  come  after 
Ri  in  the  final  ordering.  This  change  would  also  make  Program  C run  more  slowly. 

4.  ENT1  N 1 STA  OUTPUT+1,2  N 

LD2  COUNT, 1 N DEC1  1 N 

LDA  INPUT,  1 N J1P  *-4  N | 
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5.  The  running  time  is  decreased  by  A + 1 
an  improvement. 

6.  u = 0,  v = 9. 

After  Dl,  COUNT  = 0 
After  D2,  COUNT  = 2 
After  D4,  COUNT  = 2 
During  D5,  COUNT  = 2 
OUTPUT  = — 


■ N — B units,  and  this  is  almost  always 


0 

0 

5 

5 

1G 


- 4A 


0 0 0 0 
3 2 11 

12  14  15  16 
9 12  15  16  j = 8 

5L  6A  6T  61  70  7N 


7. 

8. 


After  D5,  OUTPUT  = OC  00  IN  1G  2R  4A  5T  5U  5L  6A  6T  61  70  7N  8S  9. 
Yes;  note  that  COUNT LKjl  is  decreased  in  step  D6,  and  j decreases. 

It  would  sort,  but  not  in  a stable  manner  (see  exercise  7). 


9.  Let  M = 

v — u\  assume 

that  |u| 

and  |v|  fit  in  two  bytes.  L0C (.Rj)  = 

L0C (COUNT  [j]) 

= COUNT  + y, 

LOC(Sj) 

= OUTPUT  + j;  rll  = i;  rI2  = j;  rI3 

a? 

in 

CO 

fr-t 

M 

Equ 

V-U 

KEY 

EQU 

0:2 

(Satellite  information  is  in  bytes  3:. 

1H 

ENN3 

M 

1 

Dl.  Clear  COUNTS. 

STZ 

C0UNT+V.3 

M + 1 

COUNT  [v  - kl  <-  0. 

INC3 

1 

M + l 

J3NP 

*-2 

M + 1 

u < i < v. 

2H 

ENT2 

N 

1 

D2.  Loop  on  i. 

3H 

LD3 

INPUT, 2 (KEY) 

N 

D3.  Increase  COUNT [Kj] . 

LDA 

COUNT, 3 

N 

INCA 

1 

N 

STA 

COUNT, 3 

N 

DEC2 

1 

N 

J2P 

3B 

N 

N > j > 0. 

ENN3 

M-l 

1 

D4.  Accumulate. 

LDA 

C0UNT+U 

1 

rA  -t-  COUNT  [i  - 1], 

4H 

ADD 

C0UNT+V , 3 

M 

COUNT  [i  - 1]  + COUNT  [i] 

STA 

C0UNT+V , 3 

M 

-4  COUNT  [i]. 

INC3 

1 

M 

J3NP 

4B 

M 

u < i < v. 

5H 

ENT2 

N 

1 

D5.  Loop  on  i. 

6H 

LD3 

INPUT, 2 (KEY) 

N 

D6.  Output  Ri. 

LD1 

COUNT, 3 

N 

i <-  COUNT  [Kj  ] . 

LDA 

INPUT, 2 

N 

rA  4 — Rj . 

STA 

OUTPUT, 1 

N 

Si  4 — rA. 

DEC1 

1 

N 

ST1 

COUNT, 3 

N 

COUNT  [Kj] 

DEC2 

1 

N 

J2P 

6B 

N 

N > j > 0.  | 

rI3  = i — v or 


The  running  time  is  (10 M + 22 N + 10)  u. 

10.  In  order  to  avoid  using  N extra  “tag”  bits  [see  Section  1.3.3  and  Cybernetics  1 
(1965),  95],  yet  keep  the  running  time  essentially  proportional  to  N,  we  may  use  the 
following  algorithm  based  on  the  cycle  structure  of  the  permutation: 

PI.  [Loop  on  i .]  Do  step  P2  for  1 < i < N;  then  terminate  the  algorithm. 
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P2.  [Is  p(i)  = i?]  Do  steps  P3  through  P5,  if  p(i)  ^ i. 

P3.  [Begin  cycle.]  Set  t 4—  Ri,  j 4—  i. 

P4.  [Fix  Rj.}  Set  k 4—  p(j),  Rj  4—  Rk,  p(j)  4—  j,  j 4—  k.  If  p(j)  / i,  repeat  this 
step. 

P5.  [End  cycle.]  Set  Rj  4—  t,  p(j)  4—  j.  | 

This  algorithm  changes  p(i),  since  the  sorting  application  lets  us  assume  that  p(i)  is 
stored  in  memory.  On  the  other  hand,  there  are  applications  such  as  matrix  transpo- 
sition where  p(i)  is  a function  of  i that  is  to  be  computed  (not  tabulated)  in  order  to 
save  memory  space.  In  such  a case  we  can  use  the  following  method,  performing  steps 
B1  through  B3  for  1 < i < N. 

Bl.  Set  k 4—  p(i). 

B2.  If  k > i,  set  k 4—  p(k)  and  repeat  this  step. 

B3.  If  k < i,  do  nothing;  but  if  k = i (this  means  that  i is  smallest  in  its  cycle), 
we  permute  the  cycle  containing  i as  follows:  Set  t 4—  Rp,  then  while  p(k)  / i 
repeatedly  set  Rk  4—  Rp(k)  and  k 4—  p(k)\  finally  set  Rk  4—  t.  | 

This  algorithm  is  similar  to  the  procedure  of  J.  Boothroyd  [Comp.  J.  10  (1967), 
310],  but  it  requires  less  data  movement;  some  refinements  have  been  suggested  by 
I.  D.  G.  MacLeod  [Australian  Comp.  J.  2 (1970),  16-19].  For  random  permutations 
the  analysis  in  exercise  1.3.3-14  shows  that  step  B2  is  performed  ( N + 1 )Hn  — N steps 
on  the  average.  See  also  the  references  in  the  answer  to  exercise  1.3.3-12.  Similar 
algorithms  can  be  designed  to  replace  (i?p(i), . . . , Rp(n))  by  (Ri, . . . , Rn),  for  example 
if  the  rearrangement  in  exercise  4 were  to  be  done  with  OUTPUT  = INPUT. 


Let  rll  = 

E i;  rI2  = j 

; rI3  = k: 

rX  = t. 

1H  ENT1 

N 

1 

PI.  Loop  on  i. 

2H  CMP1 

P,1 

N 

P2.  Is  v(i)  = i? 

JE 

8F 

N 

Jump  if  p(i)  = i. 

3H  LDX 

INPUT, 1 

A-B 

P3.  Begin  cycle,  t 4-  11, 

ENT2 

0,1 

A-B 

j 4-  i. 

4H  LD3 

P>2 

AT- A 

P4.  FixRi.  k<—p(i). 

LDA 

INPUT ,3 

N- A 

STA 

INPUT ,2 

N -A 

Rj  4—  Rk • 

ST2 

P,2 

N-A 

P(j)  4-  j. 

ENT2 

0,3 

AT- A 

j 4-  k. 

CMP1 

P,2 

N-A 

JNE 

4B 

N-A 

Repeat  if  p(j)  / i- 

5H  STX 

INPUT, 2 

A-B 

P5.  End  cycle.  Ri  4—  t. 

ST2 

P,2 

A-B 

p(j)  4-  j- 

8H  DEC1 

1 

N 

J1P 

2B 

N 

N > i > 1.  | 

The  running  time  is  (17 N — 5 A — 7 B + l)u,  where  A is  the  number  of  cycles  in 
the  permutation  p(l) . . .p(N)  and  B is  the  number  of  fixed  points  (1-cycles).  We  have 


A = (mini,  ave  iTy , max  N,  dev  y Rn  ~ ) and  B = (minO,  avel,  max  N,  devl), 

for  N > 2,  by  Eqs.  1.3.3-(2i)  and  1.3.3-(28). 
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12.  The  obvious  way  is  to  run  through  the  list,  replacing  the  link  of  the  A:th  element 
by  the  number  k , and  then  to  rearrange  the  elements  in  a second  pass.  The  following 
more  direct  method,  due  to  M.  D.  MacLaren,  is  shorter  and  faster  if  the  records  are 
not  too  long.  (Assume  for  convenience  that  0 < LINK(P)  < N,  for  1 < P < N,  where 
A = 0.) 

Ml.  [Initialize.]  Set  P 4—  HEAD,  k 4—  1. 

M2.  [Done?]  If  P = A (or  equivalently  if  k = N + 1),  the  algorithm  terminates. 
M3.  [Ensure  P > k.]  If  P < k,  set  P 4—  LINK(P)  and  repeat  this  step. 

M4.  [Exchange.]  Interchange  Rk  and  JJ[P],  (Assume  that  LINK(fc)  and  LINK(P) 
are  also  interchanged  in  this  process.)  Then  set  Q 4—  LINK  (A;),  LINK (fc)  4—  P, 
P 4—  Q,  k 4—  k + 1,  and  return  to  step  M2.  | 

A proof  that  MacLaren’s  method  is  valid  can  be  based  on  an  inductive  verification  of 
the  following  property  that  holds  at  the  beginning  of  step  M2:  The  entries  that  are 
> k in  the  sequence  P,  LINK(P) , LINK  (LINK  (P) ) , . . . , A are  ai,  a 2,  . . . , ajv+i-k,  where 
Ri  <■■■  < Rk-i  < Rai  < •••  < RaN+1_k  is  the  desired  final  order  of  the  records. 
Furthermore  LINK(j)  > j for  1 < j < k,  so  that  LINK(j)  = A implies  j > k. 

It  is  quite  interesting  to  analyze  MacLaren’s  algorithm;  one  of  its  remarkable  prop- 
erties is  that  it  can  be  run  backwards,  reconstructing  the  original  set  of  links  from  the 
final  values  of  LINK(l)  . . . LINK  (TV) . Each  of  the  N\  possible  output  configurations  with 
j < LINK(j)  < N corresponds  to  exactly  one  of  the  N\  possible  input  configurations. 
If  A is  the  number  of  times  P 4—  LINK(P)  in  step  M3,  then  N — A is  the  number  of  j 
such  that  LINK  O')  = j at  the  conclusion  of  the  algorithm;  this  occurs  if  and  only  if  j 
was  largest  in  its  cycle;  hence  N — A is  the  number  of  cycles  in  the  permutation,  and 
A = (minO,  ave  N — Hn,  max  IV  — 1). 

References:  M.  D.  MacLaren,  JACM  13  (1966),  404-411;  D.  Gries  and  J.  F.  Prins, 
Science  of  Computer  Programming  8 (1987),  139-145. 

13.  D5'.  Set  r 4—  N. 

D6'.  If  r = 0,  stop.  Otherwise,  if  COUNT  LKr]  < r set  r 4—  r — 1 and  repeat  this 
step;  if  COUNT  [Kr]  = r,  decrease  both  COUNT  [A'r]  and  r by  1 and  repeat  this 
step.  Otherwise  set  R 4—  Rr,  j 4—  COUNT  [A'r] , COUNT  [TO]  4—  j — 1. 

D7'.  Set  S 4-  Rj,  k 4-  COUNT  [IT,  ] , C0UNT[/L,  ] 4-  k - 1,  Rj  4-  R,  R 4-  S,  j 4-  k. 
Then  if  j ^ r repeat  this  step;  if  j = r set  Rj  4—  R,  r 4—  r — 1,  and  go  back 
to  D6'.  | 

To  prove  that  this  procedure  is  valid,  observe  that  at  the  beginning  of  step  D6'  all 
records  Rj  such  that  j > r that  are  not  in  their  final  resting  place  must  move  to  the 
left;  when  r = 0 there  can’t  be  any  such  records  since  somebody  must  move  right.  The 
algorithm  is  elegant  but  not  stable  for  equal  keys.  It  is  intimately  related  to  Foata’s 
construction  in  Theorem  5.1.2B. 

SECTION  5.2.1 

1.  Yes;  equal  elements  are  never  moved  across  each  other. 

2.  Yes.  But  the  running  time  would  be  slower  when  equal  elements  are  present,  and 
the  sorting  would  be  just  the  opposite  of  stable. 

3.  The  following  eight-liner  is  conjectured  to  be  the  shortest  MIX  sorting  routine, 
although  it  is  not  recommended  for  speed.  We  assume  that  the  numbers  appear  in 
locations  1, . . . , N (that  is,  INPUT  EQU  0);  otherwise  another  line  of  code  is  necessary. 
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2H 

LDA 

0,1 

B 

CMPA 

1,1 

B 

JLE 

IF 

B 

HOVE 

1,1 

A 

STA 

0,1 

A 

START 

ENT1 

N 

A + 

1H 

DEC1 

1 

B + 

J1P 

2B 

B + 

Note:  To  estimate  the  running  time  of  this  program,  note  that  A is  the  number  of 
inversions.  The  quantity  B is  a reasonably  simple  function  of  the  inversion  table,  and 
(assuming  distinct  inputs  in  random  order)  it  has  the  generating  function 


zN~1(l  + z)(l  + z2  +z2+1) 

X (1  + z3  + z3+2  + z3+2+1) . . . (1  + /-1  + z2N~3  + • • • + zn(n~1)/2)/N\  . 

The  mean  value  of  B is  N — 1 + - l)(2fc  - l)/6  = (N  - 1)(47V2  + N + 36)/36; 

hence  the  average  running  time  of  this  program  is  roughly 

4.  Consider  the  inversion  table  B\  . . . Bn  of  the  given  input  permutation,  in  the  sense 
of  exercise  5. 1.1-7.  Then  A is  one  less  than  the  number  of  Bj’s  that  are  equal  to  j — 1, 
and  B is  the  sum  of  the  Bj's.  Hence  both  B — A and  B are  maximized  when  the  input 
permutation  is  N . . . 2 1;  they  both  are  minimized  when  the  input  is  1 2 ...  IV.  The 
minimum  achievable  time  therefore  occurs  for  A — 0 and  B = 0,  namely  (IOjV  — 9)«; 
the  maximum  occurs  for  A = N — 1 and  B = (vf ) , namely  (4.5 N2  + 2.5 N — 6 )u. 

5.  The  generating  function  is  z107V~9  times  the  generating  function  for  9 B — 3.4.  By 
considering  the  inversion  table  as  in  the  previous  exercise,  remembering  that  individual 
entries  of  the  inversion  table  are  independent  of  each  other,  the  desired  generating 
function  is  Z10N~9  rii<j<Ar((l  + z9  H — • + z9j~18  + z9]~12) / j).  The  variance  comes  to 
2.25 N3  + 3.3751V2  - 32.625 N + 36 HN  - 9 H{2\ 

6.  Treat  the  input  area  as  a circular  list,  with  position  N adjacent  to  position  1.  Take 
new  elements  to  be  inserted  from  either  the  left  or  the  right  of  the  current  segment 
of  unsorted  elements,  according  as  the  previously  inserted  element  fell  to  the  right  or 
left  of  the  center  of  the  sorted  elements,  respectively.  Afterwards  it  will  usually  be 
necessary  to  “rotate”  the  area,  moving  each  record  k places  around  the  circle  for  some 
fixed  k\  this  can  be  done  efficiently  as  in  exercise  1.3.3-34. 

7.  The  average  value  of  \a3  — j | is 


l(U-jl  + |2-j|  + - + l»-il)-l((3a)  + ("-'+1))1 


summing  on  j gives  ^(("j1)  + ("J1))  = |(n2  - 1).  Incidentally,  the  variance  of  the 
stated  sum  can  be  shown  to  equal  [n  > 1]  (2n2  + 7 )(n  + l)/45. 


8.  No;  for  example,  consider  the  keys  21111111111. 


9.  For  Table  3,  A = 3 + 0 + 2+  l = 6,  B = 3 + 1 + 4 + 21  = 29;  in  Table  4, 
^4  = 4 + 24-2  + 0 = 8,  B = 4 + 3 + 8 + 10  = 25;  hence  the  running  time  of  Program  D 
comes  to  786 u and  734u,  respectively.  Although  the  number  of  moves  has  been  cut  from 
41  to  25,  the  running  time  is  not  competitive  with  Program  S since  the  bookkeeping 
time  for  four  passes  is  wasted  when  N = 16.  When  sorting  16  items  we  will  be  better 
off  using  only  two  passes;  a two-pass  Program  D begins  to  beat  Program  S at  about 
N = 13,  although  they  are  fairly  equal  for  awhile  (and  for  such  small  N the  length  of 
the  program  is  perhaps  significant). 
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10.  Insert  “INC1  INPUT;  ST1  OF  (0:2)”  between  lines  07  and  08,  and  change  lines  10-17 
to: 


OH 

CMPA 

INPUT+N-H, 1 

NT-S 

JGE 

7F 

NT-S 

3H 

ENT2 

N-H.l 

NT- S- 

4H 

LDX 

INPUT ,2 

B 

5H 

STX 

INPUT+H.2 

B 

DEC2 

0,4 

B 

J2NP 

6F 

B 

CMPA 

INPUT, 2 

B-A 

JL 

4B 

B-A 

6H 

STA 

INPUT+H.2 

NT-S- 

I 


For  a net  increase  of  four  instructions,  this  saves  3 (C  — T)  units  of  time,  where  C is  the 
number  of  times  Kj  > Kj-h . In  Tables  3 and  4 the  time  savings  is  approximately  87 
and  88,  respectively;  empirically  the  value  of  C/(NT  — S)  seems  to  be  about  0.4  when 
hs+\/hs  ~ 2 and  about  0.3  when  hs+i/ha  « 3,  so  the  improvement  is  worth  while.  (On 
the  other  hand,  the  analogous  change  to  Program  S is  not  desirable,  since  the  savings 
in  that  case  is  only  proportional  to  log  N unless  the  input  is  known  to  be  pretty  well 


12.  Changing  to  always  changes  the  number  of  inversions  by  ±1,  depending  on 
whether  the  change  is  above  or  below  the  diagonal. 

13.  Put  the  weight  | i — j\  on  the  segment  from  (i,j  — 1)  to  (i,j). 

14.  (a)  Interchange  i and  j in  the  sum  for  A2n  and  add  the  two  sums,  (b)  Taking  half 
of  this  result,  we  see  that 


s u-oCtor.'-VO  - £ *(T) (2: :■•-?) 


0<t<j 


i,k>  0 


hence  ^inZn  = ^2k>0  kzka2k/(  1 — 4 z)  = z/(  1 — 4 z)2 , where  a = (1  — \Jl  — 4z  )/2z. 

The  proof  above  was  suggested  to  the  author  by  Leonard  Carlitz.  Another  proof 
can  be  based  on  interplay  between  horizontal  and  vertical  weights  (see  exercise  13), 
and  still  another  by  the  identity  in  the  answer  to  exercise  5.2.2-16  with  f(k)  = k;  but 
no  simple  combinatorial  derivation  of  the  formula  An  = |_n/2j  2n~2  is  apparent. 

15.  For  n > 0, 


gn(z)  = zngn-!(z); 

n 

9n(z)  = ] P,gk(z)gn-k(z ); 

k= 1 


hn{z)  = gn(z)  + Z n g„(z); 

n 

klni^z')  — ^ ' hfc (z) hn  — k (z) . 
fc=l 


Letting  G(w,z)  = 9n(z)wn , we  find  that  wzG(w,  z)G(wz,  z)  = G(w,z)  — 1.  From 
this  representation  we  can  deduce  that,  if  t = y/1  — 4w  = 1 — 2w  — 2w2  — 4w3  — • • • , 
we  have  G(w,  1)  = (1  — t)/( 2w);  G,(w,  1)  = l/(wt)  — (1  — t)/(2w2);  G'(w,  1)  = 
l/(2t2)  — l/(2t);  G„(w,  1)  = 2/(wt3)  — 2/(w2t)  + (1  — t)/w3;  G',(w,  1)  = 2/<4-l/t3;  and 


5.2.1 


ANSWERS  TO  EXERCISES  621 


G"(w , 1)  = 1/t3  — (1  — 2w)/t4  + 10 w2/t&.  Here  lower  primes  denote  differentiation  with 
respect  to  the  first  parameter,  and  upper  primes  denote  differentiation  with  respect  to 
the  second  parameter.  Similarly,  from  the  formula 

w(zG(wz,  z)  + G(w,  z))H(w,  z)  = H(w,  z)  — 1 

we  deduce  that 

H'(w , 1)  = w/t4,  H"  (w,  1)  = —w/t3  — w/t4  + 2 w/t5  + ( 2w 2 + 20  w3)/t7 . 

The  formula  manipulation  summarized  here  was  originally  done  by  hand,  but 
today  it  can  readily  be  done  by  computer.  In  principle  all  moments  of  the  distribution 
are  obtainable  in  this  way. 

The  generating  function  g„(z)  also  represents  ^2lnternal  Path  length  over  ajj  trees 
with  n + 1 nodes;  see  exercise  2. 3. 4. 5-5.  It  is  interesting  to  note  that  G(w,  z)  is  equal 
to  F(—wz,  z)/F(—w,  z),  where  F(z,  q)  = J2n>o  ■?n<7n  7 rifc=i  (1  ~ <7*);  the  coefficient  of 

qmzn  in  F(z,  q)  is  the  number  of  partitions  m = pi  + h pn  such  that  pj  > pJ+1  + 2 

for  1 < j < n and  pn  > 0 (see  exercise  5.1.1-16). 

16.  For  h = 2 the  maximum  clearly  occurs  for  the  path  that  goes  through  the  upper 
right  corner  of  the  lattice  diagram,  namely 

^ Ln/2J  + 1 ^ 

For  general  h the  corresponding  number  is 

where  q and  r are  defined  in  Theorem  H;  the  permutation  with 

ai+jh  = 1 + q(h  — i)  + (r  — i)  [i  < r]  for  1 < i < h and  j > 0 

maximizes  the  number  of  inversions  between  each  of  the  (£)  pairs  of  sorted  sub- 
sequences. The  maximum  number  of  moves  is  obtained  if  we  replace  / by  / in  (6). 

17.  The  only  two-ordered  permutation  of  (1,2,  ...,2n}  that  has  as  many  as  (n^1) 

inversions  is  n+1  1 n+2  2 ...  2 n n.  Using  this  idea  recursively,  we  obtain  the 

permutation  defined  by  adding  unity  to  each  element  of  the  sequence  (2*  — 1)H  . . . ln0R, 
where  R denotes  the  operation  of  writing  an  integer  as  a t- bit  binary  number  and 
reversing  the  left-to-right  order  of  the  bits(!). 

18.  Take  out  a common  factor  and  let  ht  = 4N/n;  we  want  to  minimize  the  sum 

E'j=i^X-i,  when  ho  = 1.  Differentiation  yields  h3  = 4h%-ih„+i,  and  we  find 
(2*  — 1)  lg  hi  = 2t+1  — 2(t  -F 1)  + lg  ht  ■ The  minimum  value  of  the  stated  estimate  comes 
to  (1  — 2~t)n^2  -1)/21+(t-1)/(2t_1),  which  rapidly  approaches 

the  limiting  value  N\/tFN /2  as  t —>  oo. 

Typical  examples  of  “optimum”  h’s  when  N = 1000  (see  also  Table  6)  are: 

h2  « 57.64,  hi  k 6.13,  h0  = 1; 
h3  « 135.30,  h2  w 22.05,  hi  « 4.45,  h0  = 1; 
h4  w 284.46,  h3  « 67.23,  h2  « 16.34,  hi  « 4.03,  h0  = 1; 

hg  w 9164.74,  hs  « 12294.05,  h7  « 7119.55,  h6  « 2708.95,  h5  « 835.50, 

hi  « 232.00,  h3  w 61.13,  h2  « 15.69,  hi  « 3.97,  h0  = 1. 
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19.  Let  g(n,  h)  = Hr  — 1 + Ylr<j<h  <7/(/U  + r)>  where  q and  r are  defined  in  Theorem  H; 
then  replace  / by  g in  (6). 

20.  (This  is  much  harder  to  write  down  than  to  understand.)  Assume  that  a k- 
ordered  file  R\, . . . ,Rn  has  been  /i-sorted,  and  let  1 < i < N — k;  we  want  to  show  that 
Kt  < Ki+k  ■ Find  u,  v such  that  i = u and  i + k = v (modulo  h),  1 < u,  v < h;  and  apply 
Lemma  L with  Xj  = fCv+ij-ijh.  Vj  = fVu+( j-i)h-  Then  the  first  r elements  Ku,  Ku+h, 

. . . , A"u+(r-i)/i  °f  the  y's  are  respectively  < the  last  r elements  Ku+k,  Ku+k+h-  . . . , 
Ku+k+(r_i-)h  of  the  x’s,  where  r is  the  greatest  integer  such  that  u + k + (r  — l)h  < N. 

21.  If  xh  + yk  — x’h  + y'k,  we  have  ( x — x')h  — (y1  — y)k,  so  x'  = x + tk  and 
y'  = y — th  for  some  integer  t.  Let  h'h  + k'k  = 1;  then  n = (nh')h  + (nk')k,  so  every 
integer  n has  a unique  representation  of  the  form  n = xh  + yk  where  0 < x < k,  and 
n is  generable  if  and  only  if  y > 0.  Let,  similarly,  hk  — h — k — n = x'h  + y'k;  then 
(x  + x')h+  (y  + y')k  = hk  — h — k.  Hence  x + x1  = k — 1 (modulo  k)  and  we  must  have 
x + x'  = k — 1.  Hence  y + y'  = —1,  and  y > 0 if  and  only  if  y'  < 0. 

The  symmetry  of  this  result  shows  that  exactly  | {h  — l)(fc  — 1)  positive  integers  are 
unrepresentable  in  the  stated  form,  a result  originally  due  to  Sylvester  [Mathematical 
Questions,  with  their  Solutions,  from  the  ‘Educational  Times’  41  (1884),  21]. 

22.  To  avoid  cumbersome  notation,  consider  s = 4,  which  is  representative  of  the 
general  case.  Let  nk  be  the  smallest  number  that  is  congruent  to  k (modulo  15)  and 
representable  in  the  form  15ao  + 31ai  + • • • ; then  we  find  easily  that 

k = 0 1 2 3 4 5 6 7 8 9 10  11  12  13  14 

nk  = 0 31  62  63  94  125  126  127  158  189  190  221  252  253  254. 

Hence  239  = 24(24  — 1)  — 1 is  the  largest  unrepresentable  number,  and  the  total  number 
of  unrepresentables  is 

x\  = (ni  — 1 + ri2  — 2 + ■ ■ ■ + ni4  — 14)/15 

= (2  + 4 + 4 + 6 + 8 + 8)  + 8 + (10  + 12  + 12  + 14  + 16  + 16)  + 16 

= 2x3  +8-9; 

in  general,  xs  = 2xs-\  + 2s-l(2s~l  + 1). 

For  the  other  problem  the  answers  are  22s  + 2s  + 2 and  2s-1  (2s  + s — 1)  + 2, 
respectively. 

23.  Each  of  the  N numbers  has  at  most  2 — l)(/is+i  — 1) / /i„]  inversions  in  its 
subfile. 

24.  (Solution  obtained  jointly  with  V.  Pratt.)  Construct  the  “fi-recidivous  permuta- 
tion” of  {1,2,...,  IV}  as  follows.  Start  with  ai . . . ajv  blank;  then  for  j = 2,  3,  4,  . . . 
do  Step  j:  Fill  in  all  blank  positions  a,  from  left  to  right,  using  the  smallest  number 
that  has  not  yet  appeared  in  the  permutation,  whenever  ( 2h  — 1)}  — i is  a positive 
integer  representable  as  in  exercise  22.  Continue  until  all  positions  are  filled.  Thus  the 
2-recidivous  permutation  for  N = 20  is 

6 2 1 9 4 3 12  7 5 15  10  8 17  13  11  19  16  14  20  18. 

The  h-recidivous  permutation  is  (2k  — l)-ordered  for  all  k > h.  When  2 h < j < 
N/(2h  — T),  exactly  2h  — 1 positions  are  filled  during  step  j;  the  (k  + l)st  of  them  adds 
at  least  2h~1  — 2k  to  the  number  of  moves  required  to  (2h_1  — l)-sort  the  permutation. 
Hence  the  number  of  moves  to  sort  the  fi-recidivous  permutation  with  increments 
hs  = 2s  — 1 when  N = 2h+1(2h  — 1)  is  > 23h~4  > ^ IV3/2.  Pratt  generalized  this 


5.2.1 


ANSWERS  TO  EXERCISES 


623 


construction  to  a large  family  of  similar  sequences,  including  (12),  in  his  Ph.D.  thesis 
(Stanford  University,  1972).  Heuristics  that  find  permutations  needing  even  more  moves 
are  discussed  by  H.  Erkio,  BIT  20  (1980),  130-136.  See  also  Weiss  and  Sedgewick, 
J.  Algorithms  11  (1990),  242-251,  for  improvements  on  Pratt’s  construction. 

25.  Fn+ 1 [this  result  is  due  to  H.  B.  Mann,  Econometrica  13  (1945),  256];  for  the 
permutation  must  begin  with  either  1 or  2 1.  There  are  at  most  [JV/2J  inversions;  and 
the  total  number  of  inversions  is 


N-  1 
5 


2 N 

Fn  + —FN-  1. 
5 


(See  exercise  1.2.8-12.)  Note  that  the  Fn+ 1 permutations  can  conveniently  be  repre- 
sented by  “Morse  code”  sequences  of  dots  and  dashes,  where  a dash  corresponds  to 
an  inversion;  see  exercise  4.5.3-32.  Hence  we  have  found  the  total  number  of  dashes 
among  all  Morse  code  sequences  of  length  N . 

Our  derivation  shows  that  a random  3-  and  2-ordered  permutation  has  roughly 
+ 2 4>~2)N  — (/>_1JV/%/5  ~ .276 N inversions.  But  if  a random  permutation  is 
3-sorted,  then  2-sorted,  exercise  42  shows  that  it  has  « JV/4  inversions;  if  it  is  2-sorted, 
then  3-sorted,  it  has  ~ N/3. 

26.  Yes;  a shortest  example  is  4137268  5,  which  has  nine  inversions.  In  general,  the 
construction  a 3k+s  — 3k  + 4s  for  — 1 < s < 1 yields  files  that  are  3-,  5-,  and  7-ordered, 
having  approximately  | N inversions.  When  N mod  3 = 2 this  construction  is  best 
possible. 

27.  (a)  See  J.  Algorithms  15  (1993),  101-124.  A simpler  proof,  which  shows  that  c 
can  be  any  constant  < | , was  found  independently  by  C.  G.  Plaxton  and  T.  Suel, 
J.  Algorithms  23  (1997),  221-240.  (b)  This  is  obvious  if  m > |c2(ln  N/  lnln IV)2. 
Otherwise  N1+c/ ^ > N(\nN)2.  R.  E.  Cypher  [SJCOMP  22  (1993),  62-71]  has 
proved  the  slightly  stronger  bound  Sl(N(\og  N)2/  log  log  N)  when  the  increments  satisfy 
hs+i  > hs  for  all  s and  when  a sorting  network  is  constructed  as  in  exercise  5. 3. 4-2. 
No  nontrivial  lower  bounds  are  yet  known  for  the  asymptotic  average  running  time. 

28.  209  109  41  19  5 1,  from  (11).  But  better  sequences  are  possible;  see  exercise  29. 

29.  Experiments  by  C.  Tribolet  in  1971  resulted  in  the  choices  373  137  53  19  7 3 1 
(Bave  ~ 7210)  and  317  101  31  11  3 1 (Bave  « 8170).  [The  first  of  these  yields  a sorting 
time  of  ~ 127720?!,  compared  to  « 128593u  when  the  same  data  are  sorted  using 
increments  (11).]  In  general  Tribolet  suggests  letting  hs  be  the  nearest  prime  number  to 
Ns^.  Experiments  by  Shelby  Siegel  in  1972  indicate  that  the  best  number  of  increments 
in  such  a method,  for  N < 10000,  is  t tv  | ln(JV/5.75).  On  the  other  hand,  Marcin 
Ciura’s  experiments  [Lect.  Notes  Comp.  Sci.  2138  (2001),  106-117]  indicate  that  the 
minimum  7-pass  Bave  («  6879)  is  obtained  with  increments  229  96  41  19  10  4 1,  while 
the  sequence  737  176  69  27  10  4 1 yields  the  smallest  total  sorting  time  («  125077u). 

The  best  three-increment  sequence,  according  to  extensive  tests  by  Carole  M. 
McNamee,  appears  to  be  45  7 1 (Bave  ~ 18240).  For  four  increments,  91  23  7 1 was 
the  winner  in  her  tests  (Bave  « 11865),  but  a rather  broad  range  of  increments  gave 
roughly  the  same  performance. 

30.  The  number  of  integer  points  in  the  triangular  region 


{a?ln2  + ?/ln3  < In N,  x > 0,  y > 0}  is  ^(log2N)(\og3N)  + 0(\ogN). 

While  we  are  h-sorting,  the  file  is  already  2/i-ordered  and  3h-ordered,  by  Theorem  K; 
hence  exercise  25  applies. 
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01 

START 

ENT3 

T 

1 

02 

1H 

LD4 

H,3 

T 

03 

ENN2 

-INPUT-N.4 

T 

04 

ST2 

6F(0:2) 

T 

05 

ST2 

7F(0:2) 

T 

06 

ST2 

4F(0:2) 

T 

07 

ENT2 

0,4 

T 

08 

JMP 

9F 

T 

09 

2H 

LDA 

INPUT+N , 1 

NT 

-S-B  + A 

10 

4H 

CMPA 

INPUT+N-H, 1 

NT 

-S-B  + A 

11 

JGE 

8F 

NT 

-S-B  + A 

12 

6H 

LDX 

INPUT+N-H ,1 

B 

13 

STX 

INPUT+N ,1 

B 

U 

7H 

STA 

INPUT+N-H, 1 

B 

15 

INC1 

0,4 

B 

16 

8H 

INC1 

0,4 

NT -B  + A 

17 

J1NP 

26 

NT -B  + A 

18 

DEC2 

1 

S 

19 

9H 

ENT1 

-N,2 

T + S 

20 

J2P 

8B 

T + S 

21 

DEC3 

1 

T 

22 

J3P 

IB 

T 

Here  A is  related  to  right-to-left  maxima  in  the  same  way  that  A in  Program  D is 
related  to  left-to-right  minima;  both  quantities  have  the  same  statistical  behavior.  The 
simplifications  in  the  inner  loop  have  cut  the  running  time  to  7 NT  + 7 A — 2S  + 1 + 15T 
units,  curiously  independent  of  B ! 

When  N = 8 the  increments  are  6,  4,  3,  2,  1,  and  we  have  Aave  = 3.892,  Have  = 
6.762;  the  average  total  running  time  is  276. 24u.  (Compare  with  Table  5.)  Both  A 
and  B are  maximized  in  the  permutation  7384516  2.  When  N = 1000  there  are 
40  increments,  972, 864,  768,  729, . . . , 8, 6, 4, 3, 2, 1;  empirical  tests  like  those  in  Table  6 
give  A ~ 875,  B ~ 4250,  and  a total  time  of  about  268000u  (more  than  twice  as  long 
as  Program  D with  the  increments  of  exercise  28). 

Instead  of  storing  the  increments  in  an  auxiliary  table,  it  is  convenient  to  generate 
them  as  follows  on  a binary  machine: 

PI.  Set  m 4—  2^lgJv^_1,  the  largest  power  of  2 less  than  N. 

P2.  Set  h 4-  m. 

P3.  Use  h as  the  increment  for  one  sorting  pass. 

P4.  If  h is  even,  set  h 4—  h + h/ 2;  then  if  h < N,  return  to  P3. 

P5.  Set  m 4-  |_m/2j  and  if  m > 1 return  to  P2.  | 

Although  the  increments  are  not  being  generated  in  descending  order,  the  order  speci- 
fied here  is  sufficient  to  make  the  sorting  algorithm  valid. 

32.  4 12  11  13  2 0 8 5 10  14  1 6 3 9 16  7 15. 

33.  Two  types  of  improvements  can  be  made.  First,  by  assuming  that  the  artificial 
key  Ko  is  oo,  we  can  omit  testing  whether  or  not  p > 0.  (This  idea  has  been  used,  for 
example,  in  Algorithm  2.2.4A.)  Secondly,  a standard  optimization  technique:  We  can 
make  two  copies  of  the  inner  loop  with  the  register  assignments  for  p and  q interchanged; 
this  avoids  the  assignment  q 4—  p.  (This  idea  has  been  used  in  exercise  1.1-3.) 
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Thus  we  assume  that  location  INPUT  contains  the  largest  possible  value  in  its  (0 : 3) 
field,  and  we  replace  lines  07  and  following  of  Program  L by: 


07 

8H 

LD3 

INPUT, 2 (LINK) 

B' 

p «—  Lq.  (Here  p = rI3,  q = rI2. 

08 

CMPA 

INPUT, 3 (KEY) 

B' 

09 

JG 

4F 

B' 

To  L4  with  q p if  K > Kp. 

10 

7H 

ST1 

INPUT, 2 (LINK) 

N' 

Lq  t—  j. 

11 

ST3 

INPUT, 1 (LINK) 

N' 

Lj  «—  p. 

12 

JMP 

6F 

N' 

Go  to  decrease  j. 

13 

4H 

LD2 

INPUT, 3 (LINK) 

B" 

p Lq.  (Here  p = rI2,  q = rI3. 

H 

CMPA 

INPUT, 2 (KEY) 

B" 

15 

JG 

8B 

B" 

To  L4  with  q <-*  p if  K > Kp. 

16 

5H 

ST1 

INPUT, 3 (LINK) 

N" 

Lq  «—  j . 

17 

ST2 

INPUT, 1 (LINK) 

N" 

Lj  t-  p. 

18 

6H 

DEC1 

1 

N 

j 3 - 1. 

19 

ENT3 

0 

N 

q <r-  0. 

20 

LDA 

INPUT, 1 

N 

K <r-  Kj. 

21 

J1P 

4B 

N 

N > j > 1.  | 

Here  B'  + B"  = B + N — 1,  N'  + N"  = N — 1,  so  the  total  running  time  is 
5 B + 14 A’  + N'  — .3  units.  Since  N1  is  the  number  of  elements  with  an  odd  number  of 
lesser  elements  to  their  right,  it  has  the  statistics 

11  1 

(minO,  ave  - N + -H \_N/2\  - - HN,  max  N - 1). 

The  oo  trick  also  speeds  up  Program  S;  the  following  code  suggested  by  J.  H. 
Halperin  uses  this  idea  and  the  MOVE  instruction  to  reduce  the  running  time  to  (6 B + 
11N  — 10)  u,  assuming  that  location  INPUT+N+1  already  contains  the  largest  possible 
one- word  value: 


01 

START 

ENT2 

N-l 

1 

02 

2H 

LDA 

INPUT, 2 

N-l 

03 

ENT1 

INPUT, 2 

N-l 

04 

JMP 

3F 

N-l 

05 

4H 

MOVE 

1,1(1) 

B 

06 

3H 

CMPA 

1,1 

B + N - 

1 

07 

JG 

4B 

B + N - 

1 

08 

5H 

STA 

0,1 

N-l 

09 

DEC2 

1 

N-l 

10 

J2P 

2B 

N-l 

Doubling  up  the  inner  loop  would  save  an  additional  B/2  or  so  units  of  time. 

34.  There  are  (^)  sequences  of  N choices  in  which  the  given  list  is  chosen  n times; 
every  such  sequence  has  probability  (1/M)n(l  - 1 /M)N~n  of  occurring,  since  the  given 
list  is  chosen  with  probability  1/M. 


24 

ENT1 

0 

1 

29 

ENT1 

0,3 

N 

25 

ENT2 

1-M 

1 

30 

LD3 

INPUT, 1 (LINK) 

N 

26 

7H  LD3 

HEAD+M , 2 

M 

31 

J3P 

*-2 

N 

27 

J3Z 

8F 

M 

32 

8H  INC2 

1 

M 

28 

ST3 

INPUT, 1 (LINK) 

M-E 

33 

J2NP 

7B 

M 
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Note:  If  Program  M were  modified  to  keep  track  of  the  current  end  of  each  list, 
by  inserting  “ST1  END, 4”  between  lines  19  and  20,  we  could  save  time  by  hooking  the 
lists  together  as  in  Algorithm  5.2.5H. 

36.  Program  L:  A = 3,  B = 41,  N = 16,  time  = 496u.  Program  M:  A=2+l+l+ 
3 = 7,  B = 2+0+3+3  = 8,  N = 16,  time  = 549«.  (We  should  also  add  the  time  needed 
by  exercise  35,  94-u,  in  order  to  make  a strictly  fair  comparison.  The  multiplications 
are  slow!  Notice  also  that  the  improved  Program  L in  exercise  33  takes  only  358 u.) 

37.  The  stated  identity  is  equivalent  to 

gNM(z)  = M~n  E ( I N'  I ) ffm(z)...ffnM(z), 

ni+..*?nM=N  W 

which  is  proved  as  in  exercise  34.  It  may  be  of  interest  to  tabulate  some  of  these 
generating  functions,  to  indicate  the  trend  for  increasing  M: 

541(2)  = (216+  648z  + 1080z2  + 1296z3  + 1080z4  + 648z5  + 216z6)/5184, 
542(2):=  (945  + 1917z  + 1485z2  + 594z3  + 135z4 + 81z5  + 27z®)/5184, 

543(2)  = (1704  + 22642+  840z2  + 304z3  + 40z4  + 24z5  + 8z6)/5184. 


If  Gm(w,z)  is  the  stated  double  generating  function,  differentiation  by  2 gives 


G'M(W,Z)  = M(  ^5n(2) 


, w 
n! 


Es"(2 


hence 


E 5jvm(  1) 


N>  0 


Mnwn 

N\ 


---  Me(M~1)w  ( —ew  ) = — w2e* 
4 


M 

4 


similarly,  the  formula  5n(l)  = §(4)  + §(3)  yields 

„2 


E^M(l)^^=M(M-l)e<--2)“ 

N>0 


+ Me(M_1)” 


vf_  5 vf 
16  + 18 


Equating  coefficients  of  wN  gives  5nm(1)  = 9nm(  1)  = (f(^)  + |(^))-M  2> 

and  the  variance  is  (|  (‘^)  + (-^'( ( M~2. 

38.  = (a)E  iPh  setting  p3  = F(j /M)  - F((j  - 1)/M), 
and  F'(x)  = f(x),  this  is  asymptotic  to  ('^)  /M  times  f(>  f(x)2  dx  when  F is  reasonably 
well  behaved.  [However,  f*  f(x)2  dx  might  be  quite  large.  See  Theorem  5.2.5T  for  a 
refinement  that  applies  to  all  bounded  integrable  densities.] 

39.  To  minimize  AC/M  + BM  we  need  M = \J AC/ B,  so  M is  one  of  the  integers 
just  above  or  below  this  quantity.  (In  the  case  of  Program  M we  would  choose  M 
proportional  to  N .) 

40.  The  asymptotic  series  for 

E n_1(  1 - a/N)n~N  = - N _1  + E(^  + fc)_1(!  - a/N)k 

n>N  k>  0 

can  be  obtained  by  restricting  k to  0(JV1+e),  expanding  (1  — a/N)k  as  e ak/N  times 
(1  — ka2/2N2  + • • • ),  and  using  Euler’s  summation  formula;  it  begins  with  the  terms 
eaEi(a)(l  + a2/2N)  — (1  + a)/2N  + 0(N~2).  Hence  the  asymptotic  value  of  (15)  is 


5.2.1 


ANSWERS  TO  EXERCISES  627 


Ar(lna+7+£Ji(a))/a+(l  — e-a(l+a))/2a+0(lV-1).  [The  coefficient  of  N is  « 0.7966, 
0.6596,  0.2880,  respectively,  for  a = 1,2, 10. j Note  that  we  have  In  a + 7 + Ei(a)  = 
/“(l  — dt,  by  exercise  5.2.2-43. 

41.  (a)  We  have  cik  = 0(pk),  because  the  prime  number  theorem  implies  that  the 

number  of  primes  between  pk  and  pk+1  is  (pk+1/(k  + 1)  — pk/k) /In  p + 0(pk/k2);  this  is 
positive  for  all  sufficiently  large  k.  Therefore  the  sum  of  the  first  (k)  elements  of  (10) 
is  b(<H,aj)  = Ei <i<j<k  0{pl+J);  and  we  have 

V i+i  pV  - l)^-1  - 1) 

,<£</  ■ 

(b)  If  (fc21)  < log pN  < (2)  we  have  (k  — 2)2  < 21ogplV,  hence  p2k  = 0(exp  c\/hi7V ). 

Notice  that  as  p — > 1,  the  base  sequence  01,02,...  becomes  equal  to  the  sequence 
of  prime  numbers,  and  the  bound  in  Theorem  I reduces  to  0(AT(log  lV)4(log  log  N)~2). 

42.  (a)  [A.  C.  Yao,  J.  Algorithms  1 (1980),  14-50.]  We  can  show  that  each  of  the 
(2)  pairs  of  lists  contributes  ^g~2h~3^2  N3^2  + O(Nfgh)  inversions  to  each  subfile 
(Ka,  Ka+g,  Ka+2g,  ■ ■ . ),  1 < a < g.  For  example,  suppose  h = 12,  g = 5,  a = 1,  and  con- 
sider inversions  where  the  lists  K3  < Ki$  < K27  < ■ ■ ■ and  K7  < K\g  < K31  < • • • inter- 
sect the  subfile  (Ki,Ke,  K\\, . . . ).  After  the  first  pass,  (K3,  Kr,  Ki$,  K 19,  K27,  K31, . . . ) 
is  a random  2-ordered  permutation.  The  elements  Kj  of  concern  to  us  have  j = 1 
(modulo  5)  and  j = 3 or  7 (modulo  12);  hence  j = 51  or  31  (modulo  60),  and  we  want 
to  compute  the  average  value  of  p(51,31)  where 

9 ( X ■ y)  = 'y  , ([Kx+ghj  > Ky+ghk]  + [Ky+ghj  > Tfx+ghfc])  + f(x,  y)  > 
j<k 

r(x,y)  = Y [Kmin(x  ,y)+ghj  ^ 7fmax(x,j/)+ghj]  < N/ gh  + 1 . 

3 

If  |p|  < 9 and  |q|  < g we  have 

[Kj+ph  — gh  Kk+qh+gh\  5:  [Ej  > Kk\  ^ [Kj+ph+gh  ^ Tffc+qh  — gh]  j 

hence 

[Kx+ghj  ^ Tfy+ghfc]  "F  \Ky+gh.j  ^ Kx+ghk\ 

[7fx+ph+gh(g  + l)  > 7fy+qfe+gh(k-l)]  + [Ey+qh+gh(j+l)  > 7fx+ph+gfc(A:  — 1)  ] 

and  it  follows  that  g(x,y)  < g(x  + ph,y  + qh)  + 8N/gh.  Similarly  we  find  g(x,y)  > 
g(x  + ph,y  + qh)  — 8N/gh.  But  the  sum  of  g(x,y)  over  all  g2  pairs  ( x,y ) such  that 
x mod  h = b and  y mod  h = c,  for  any  given  6 / c,  is  the  total  number  of  inversions 
in  a random  2-ordered  permutation  of  2 N/h  elements.  Therefore  by  exercise  14,  the 
average  value  of  g(x,y)  is  g~2-\/n/128  (2 N/h)3/2  + O(Nfgh). 

(b)  See  S.  Janson  and  D.  E.  Knuth,  Random  Structures  & Algorithms  10  (1997), 
125-142.  For  large  g and  h we  have  ip(h,g)  = \Jnh/\.28g  + 0(g~1^2h1^2)  + 0(gh~ 1//2). 

43.  If  K < Ki  after  step  D3,  set  (Ki, . . . , Kj-h,  Kj)  4—  (K,  Ki, . . . , Kj-h)',  otherwise 
do  steps  D4  and  D5  until  K > Ki.  Here  l = 1 when  j = h + 1,  and  l t—  l + 1 — 
h [l  = h]  when  j increases  by  1.  [See  H.  W.  Thimbleby,  Software  Practice  & Exper.  19 
(1989),  303-307.]  However,  with  a decent  sequence  of  increments  the  inner  loop  is  not 
performed  often  enough  to  make  this  change  desirable. 
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Another  idea  for  speeding  up  the  program  [see  W.  Dobosiewicz,  Inf.  Proc.  Letters 
11  (1980),  5-6]  is  to  sort  only  partially  when  h > 1,  not  attempting  to  propagate  Kj 
further  left  than  position  j — h;  but  that  approach  seems  to  require  more  increments. 

44.  (a)  Yes.  This  is  clear  whenever  n1  is  one  step  above  n,  and  exercise  5.1.1-29  shows 
that  there  is  a path  of  adjacent  transpositions  from  it  to  any  permutation  above  it. 

(b)  Yes.  Similarly,  if  n is  above  7r',  nR  is  below  n'R. 

(c)  No;  2 1 3 is  neither  above  nor  below  312,  but  2 1 3 < 3 1 2. 

[The  partial  ordering  n < ir'  was  first  discussed  by  C.  Ehresmann,  Annals  of  Math. 
(2)  35  (1934),  396-443,  §20,  in  the  context  of  algebraic  topology.  Many  mathematicians 
now  call  it  the  “Bruhat  order”  of  permutations,  while  aboveness  is  called  the  “weak 
Bruhat  order”  — although  aboveness  is  actually  a stronger  condition,  because  it  holds 
less  often.  Only  the  weak  order  defines  a lattice.] 


SECTION  5.2.2 

1.  No;  it  has  2m  + 1 fewer  inversions,  where  m > 0 is  the  number  of  elements  a*,  such 
that  i < k < j and  o,  > a*,  > aj . (Hence  all  exchange-sorting  methods  will  eventually 
converge  to  a sorted  permutation.) 

2.  (a)  6.  (b)  [A.  Cayley,  Philosophical  Mag.  (3)  34  (1849),  527-529.]  Consider 
the  cycle  representation  of  7r.  Any  exchange  of  elements  in  the  same  cycle  increases 
the  number  of  cycles  by  1;  any  exchange  of  elements  in  different  cycles  decreases  the 
number  by  1.  (This  is  essentially  the  content  of  exercise  2. 2.4-3.)  A completely  sorted 
permutation  is  characterized  by  having  n cycles.  Hence  xch(7r)  is  n minus  the  number 
of  cycles  in  7r.  (Algorithm  5.2.3S  does  exactly  xch(7r)  exchanges;  see  exercise  5. 2. 3-4.) 

3.  Yes;  equal  elements  are  never  moved  across  each  other. 

4.  It  is  the  probability  that  bi  > max(62, . . . , bn)  in  the  inversion  table,  namely 


\J ir /2n  + 0(n  x)  = negligible. 


5.  We  may  assume  that  r > 0.  Let  fe'j  = (bi  - r + 1)  [bi  > r]  be  the  inversion  table  after 
r — 1 passes.  If  b\  > 0,  element  i is  preceded  by  b't  larger  elements,  the  largest  of  which 
will  bubble  up  at  least  to  position  b\  + i,  because  there  are  i elements  < i.  Furthermore 
if  element  j is  the  rightmost  to  be  exchanged,  we  have  b'j  > 0 and  BOUND  = b'j  + j — 1 
after  the  rth  pass. 

6.  Solution  1:  An  element  displaced  farthest  to  the  right  of  its  final  position  moves 
one  step  left  on  each  pass  except  the  last.  Solution  2 (higher  level):  By  exercise  5. 1.1-8, 
answer  (f),  o(  — i = bi  — Ci,  for  1 < i < n,  where  Ci  C2  . . . cn  is  the  dual  inversion  table. 
If  bj  = max(6i, . . . , bn)  then  Cj  = 0. 

7.  (2(n  + 1)(1  + P(n)  - P(n  + 1))  - P(n)  - P(n)2)1/2  = y/(2  - tt/2 )n  + 0(1). 

8.  For  i < k + 2 there  are  j + k — i + 1 choices  for  bp,  for  k + 2 < i < n — j + 2 there 
are  j — 1 choices;  and  for  i > n — j + 2 there  are  n — i + 1. 

10.  (a)  If  * = 2fe  — 1,  from  ( k — 1,  aj  — k ) to  ( k , at  — k).  If  i = 2 k,  from  (a;  — k,  k — 1) 
to  ( ai  — k,  k).  (b)  Step  a^k-x  is  above  the  diagonal  k < a^k-i  — k <=>  a2fc-i  > 2k 
-*=!>  a2k- 1 > a2 k 4=>  a2k  < 2k  - 1 <=>  a2k  - k < k - 1 <=»  step  a2k  is  above 
the  diagonal.  Exchanging  them  interchanges  horizontal  and  vertical  steps,  (c)  Step 
a2k+d  is  at  least  m below  the  diagonal  <^=>  k + m — 1 > a2k+d  — (k  + m)  + m <=> 
a2k+d  <2 k + m <=>  a2k  >2 k + m a2k  — k > k + m <=$•  step  a2k  is  at  least  m 
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below  the  diagonal.  (If  a2k+d  <2 k + m and  a2k  < 2k + m,  there  are  at  least  (k  + m)  + k 
elements  less  than  2k  + m;  that’s  impossible.  If  a2k+d  >2 k + m and  a2k  > 2 k + m, 
one  of  the  > must  be  > ; but  we  can’t  fit  all  of  the  elements  <2 k + m into  fewer  than 
(k  + m)  + k positions.  Hence  02fc+2m-i  < «2fc  if  and  only  if  02fc+2m-i  < 2k  + m if  and 
only  if  2k  + m < a2k ■ A rather  unexpected  result!) 

11.  16  10  13  5 14  6 9 2 15  8 11  3 12  4 7 1 (61  exchanges),  by  considering  the  lattice 
diagram.  The  situation  becomes  more  complicated  when  N is  larger;  in  general,  the  set 
{K2,  K±, . . . } should  be  {1,  2, . . . , Af-1,  M,  M+2,  M+4, . . . ,2[N/2\-M},  permuted 
so  as  to  maximize  the  exchanges  for  \N/2\  elements.  Here  M = [2fc/3],  where  k 
maximizes  fc|^V/2j  — |((3fc  — 2)2fc_1  + (— l)fc).  The  maximum  total  number  of  exchanges 
is  1 — 2 lg  lg  IV/ lg  AT  + 0(1/  log  N)  times  the  number  of  comparisons  [R.  Sedgewick, 
SICOMP  7 (1978),  239-272]. 

12.  The  following  program  by  W.  Panny  avoids  the  AND  instruction  by  noting  that 
step  M4  is  performed  for  i = r + 2 kp  + s,  k > 0,  and  0 < s < p.  Here  TT  = 2f_1, 
p = rll,  r = rI2,  i = rI3,  i + d — N = rI4,  and  p — 1 — s = rI5;  we  assume  that  N >2. 


01 

START 

ENT1 

TT 

1 

Ml.  Initialize  v.  v <—  2t_1 . 

02 

2H 

ENT2 

TT 

T 

M2.  Initialize  a.  r,  d. 

03 

ST2 

Q ( 1 : 2) 

T 

q «-  2t_1. 

04 

ENT2 

0 

T 

r <—  0. 

05 

ENT4 

0,1 

T 

rI4  +—  d. 

06 

3H 

ENT3 

0,2 

A 

M3.  Loop  on  i.  i <—  r. 

07 

INC4 

-N,3 

A 

rI4  «—  i + d — N . 

08 

8H 

ENT5 

"1,1 

D + E 

s <—  0. 

09 

4H 

LDA 

INPUT+1 ,3 

C 

M4.  Compare/exchange  Ri+%  : . 

10 

CMPA 

INPUT+N+1 ,4 

C 

11 

JLE 

*+4 

c 

Jump  if  Ki+ 1 < Ki+d+ 1- 

12 

LDX 

INPUT+N+1, 4 

B 

13 

STX 

INPUT+1, 3 

B 

Ri+ i t-t  Ri+d+i . 

U 

STA 

INPUT+N+1, 4 

B 

15 

J5Z 

7F 

C 

Jump  if  s = p — 1. 

16 

DEC  5 

1 

C-D 

s +-  s + 1. 

17 

INC3 

1 

C-D 

i «—  i + 1. 

18 

INC4 

1 

C-D 

19 

J4N 

4B 

C-D 

Repeat  loop  if  i + d < N. 

20 

JMP 

5F 

E 

Otherwise  go  to  M5. 

21 

7H 

INC3 

1,1 

D 

i «-  i + p+  1. 

22 

INC4 

1,1 

D 

23 

J4N 

4B 

D 

Repeat  loop  if  i + d < N. 

24 

5H 

ENT2 

0,1 

A 

M5.  Loop  on  a.  r +-  v. 

25 

Q 

ENT4 

* 

A 

rI4  +-  q. 

26 

ENTA 

0,4 

A 

27 

SRB 

1 

A 

28 

STA 

Q(l:2) 

A 

q «—  q/2. 

29 

DEC4 

0,1 

A 

rI4  +-  d. 

30 

J4P 

3B 

A 

To  M3  if  d ± 0. 

31 

6H 

ENTA 

0,1 

T 

M6.  Loop  on  v. 

32 

SRB 

1 

T 

33 

STA 

*+1(1:2) 

T 
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34  ENT1  * T p<-[p/2j. 

35  J1NZ  2B  T To  M2  if  p ^ 0.  | 

The  running  time  depends  on  six  quantities,  only  one  of  which  depends  on  the  input 
data  (the  remaining  five  are  functions  of  N alone):  T = t,  the  number  of  “major 
cycles”;  A = t(t  + l)/2,  the  number  of  passes  or  “minor  cycles”;  B = the  (variable) 
number  of  exchanges;  C = the  number  of  comparisons;  D = the  number  of  blocks  of 
consecutive  comparisons;  and  E = the  number  of  incomplete  blocks.  When  N = 2t , it 
is  not  difficult  to  prove  that  D = (t  — 2 )N  + t + 2 and  E = 0.  For  Table  1,  we  have 
T = 4,  A =10,  5 = 3 + 0 + 1 + 4 + 0 + 0 + 8 + 0 + 4 + 5 = 25,  C = 63,  D = 38,  E = 0, 
so  the  total  running  time  is  11 A + 6B  + 10C  + 2E  + 12 T + 1 = 939u. 

In  general  when  N = 2ei  + • • • + 2er,  Panny  has  shown  that  D = ei(N  + 1)  — 
2(2ei  - 1),  E = (ei-e-)  + (ei  + e2  + . . . + er_0  - (ei  - l)(r  - 1). 

13.  No,  nor  are  Algorithms  Q or  R. 

14.  (a)  When  p = 1 we  do  (2t_1  - 0)  + (2i_1  - 1)  + (2t_1  - 2)  + (2*-1  - 4)  H + 

(2*”1—  2*~2)  = (t  — l)2t_1  + l comparisons  for  the  final  merge,  (b)  xt  = + — 1)  + 

2~*  = ' ' ' = Zo+Eo<*<t(^+2~'t~1)  = + Hence  c(2‘)  = 2‘-2(t2— t+4)-l. 

15.  (a)  Consider  the  number  of  comparisons  such  that  i + d = IV;  then  use  induction 

on  r.  (b)  If  b(n)  = c(n  +1),  we  have  b(2n)  = a(l)  H b a(2n)  = a(0)  + o(l)  + a(l)  + 

• • • + a(n  — 1)  + a(n)  + s(l)  + x(2)  + ■ • • + x{2 n)  = 2b(n)  + y{2n)  — a(n);  similarly 
b(2n  + 1)  = 2 b(n)  + y(2n  + 1).  (c)  See  exercise  1.2.4-42.  (d)  A rather  laborious 
calculation  of  ( z(N ) + 2z(\N/2\)  + •••)  — a(N),  using  formulas  such  as 

= E2‘("-‘)=2»+‘-("  + 2)-l, 

k= 0 k= 0 

leads  to  the  result 

c(N)  = IV  Q ( ^ ) + 2ex  - i)  - 2ei(er  - 1)  - 1 


16.  Consider  the  (2”)  lattice  paths  from  (0,0)  to  (n,  n)  as  in  Figs.  11  and  18,  and 
attach  weight  f(i  - j)  if  i > j,  f(j  — i — 1)  + 1 if  i < j,  to  the  line  from  (i,j)  to 
( i + 1,  j);  here  f(k)  is  the  number  of  bit  variations  br  / br+i  in  the  binary  expansion 
k = (...  b2f>i  60)2-  The  total  number  of  exchanges  on  the  final  merge  when  N = 2n 
is  then  Eo <3<i<n(2/0)  + !)  (*’-/)  Sedgewick  showed  that  this  sum 

simplifies,  for  general  /,  to  f (2„n)  +2  Efe>!  (n2_\)  Eo <j<k  then  he  used  the  gamma 
function  method  to  obtain  the  asymptotic  formula 


'2  n 


-nlgn  + 


lg 


r(i/4)2 

27T 


+h^i+^ 


, + 0(\/n\ogn) 


)■ 


where  8(n)  is  a periodic  function  of  lgn  with  magnitude  bounded  by  .0005.  Hence 
about  1/4  of  the  comparisons  lead  to  exchanges,  on  the  average,  as  n — t 00.  [SICOMP 
7 (1978),  239-272;  see  also  Flajolet  and  Odlyzko,  SIAM  J.  Discrete  Math.  3 (1990), 
238-239.] 

17.  Kn+ 1 is  inspected  when  we  are  sorting  a subfile  with  r = N and  Ki  the  largest 
key.  Ko  is  inspected  during  step  Q9  if  left-to-right  minima  sink  to  position  R\ . 
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18.  Steps  Q3  and  Q4  make  only  a single  change  to  i and  j before  exiting  to  Q5;  the 
partitioning  process  for  Rt . . . Rr  ends  with  j = \(l  + r) /2]  in  step  Q7,  bisecting  the 
subfile  as  perfectly  as  possible.  Quantitatively  speaking,  we  replace  (17)  by  A = 1, 
B = |_(iV  — 1)/2J,  C = N + (N  mod  2);  this  puts  us  essentially  in  the  best  case  of  the 
algorithm  (see  exercise  27),  except  that  B « \C.  If  the  “<”  signs  in  steps  Q3  and  Q4 
are  changed  to  “<  the  algorithm  won’t  sort  any  more;  even  if  we  assume  “<”  signs 
in  (13),  it  will  interchange  R0  with  R\ , then  the  third  partitioning  phase  will  move  the 
original  Rq  to  position  R2,  etc.  — a real  catastrophe. 

19.  Yes,  the  other  subfiles  may  be  processed  in  any  order.  But  the  queue  will  contain 
Q(AT/-\/log  N ) items  when  each  partitioning  step  divides  the  file  equally,  while  a stack 
is  guaranteed  to  stay  much  smaller  than  this  (see  the  next  exercise). 

20.  max(0,  [lg(7V+2)/(M+2)J).  (The  worst  case  occurs  when  N = 2 k(M  + 2)  — 1 and 
all  subfiles  are  perfectly  bisected  when  they  are  partitioned.) 

21.  Exactly  t records  move  to  the  area  R,+i  . . . RN  in  step  Q6,  hence  B — t.  The 
partitioning  phase  ends  with  j = s,  hence  C — C'  = N+l  — s is  the  number  of  times 
j decreases.  We  must  also  have  i = s + 1 in  step  Q7  when  the  keys  are  distinct,  since 
i — j implies  Kj  — K;  thus  C'  = s. 

22.  The  stated  relations  for  An(z)  follow  because  As-\(z) An-s(z)  is  the  generating 
function  for  the  value  of  A after  independently  sorting  randomly  and  independently 
ordered  files  of  sizes  s — 1 and  N — s.  Similarly,  we  obtain  the  relations 

N s 

Bn(z)  = EE  bstN  z*  Bs-i  (z)  Bn -s  {z) , 

s=l  t=0 


Cn (z)  = ±J2zn+1Cs-1(z)CN-s(z), 

8=1 

1 N 

Dn(z)  = — E D„-i(z)DN-3(z), 

8=1 

1 N 

En(z)  = —^2es-1(z)EN-s{z), 

8=1 

= jjitzlM+1<a<N-M]S,-i(z)SN-,(z), 

8=1 

for  N > M.  Here  6stjv  is  the  probability  that  s and  t have  given  values  in  a file  of 
length  N,  namely 

which  is  (1/iV!)  times  the  (s  — 1)!  ways  to  permute  {1, . . . , s — 1}  times  the  (N  — s)\  ways 
to  permute  (s  + 1, . . . , N}  times  the  (3^1)  (JV~S)  patterns  with  t displaced  elements  on 
each  side.  For  0 < N < M,  we  have  Bn(z)  = Cn{z)  = Sn(z)  = 1;  Dn(z)  = 

n£Li((l  + (*  - l)*)/fc);  and  En(z)  = nf=1((l  + 2 + • ' • + »k-1)/k). 

[It  is  interesting  to  consider  the  behavior  of  these  generating  functions  when  N is 
large;  a sequence  analogous  to  Cn(z),  but  with  zN+1  replaced  by  zN~1 , is  known  to 
converge  to  a non-normal  probability  distribution  that  has  not  yet  been  fully  analyzed. 
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See  the  articles  by  P.  Hennequin,  M.  Regnier,  and  U.  Rosier  in  RAIRO  Theoretical 
Informatics  and  Applications  23  (1989),  317-333;  23  (1989),  335-343;  25  (1991),  85- 
100.] 

23.  When  N > M,  An  = 1 + ( 2/JV ) So <k<N  ^k',  Bn  = So <t<s<N  b.tN(t  + Bs- 1 + 
Bn-.)  = (l/A0Sf=1((s  - l)(N  - s)/(N  - 1)  + Bs^  + BN _7)  = (N  - 2)/6  + 
(2/N)  So<fc<JV-Bfc  [see  exercise  22];  Dn  = (2/iV)  Ylo<k<N Dk;  En  is  similar.  When 
N > 2 M + 1,  Sn  = (2 /N)  So <k<N  &k  + (N  — 2 M — 2 )/N.  Each  of  these  recurrences 
has  the  form  (19)  for  some  function  fn. 

24.  The  recurrence  Cn  = N — 1 + (2/iV)So <k<N^k,  for  N > M,  has  the  solution 
(IV  + l)(2ifjv+i  — 2Hm+2  + 1 — 4/(M  + 2)  + 2/(iV  + 1)),  for  N > M.  (So  we  could  save 
about  4 N/M  comparisons.  But  each  comparison  takes  longer  if  it  must  be  followed  by 
a test  of  i versus  j,  so  we  lose,  unless  the  cost  of  a key  comparison  exceeds  | M In  N 
times  the  cost  of  a register  comparison.  Many  texts  on  sorting  fail  to  realize  that  such 
an  “improvement”  makes  quicksort  significantly  less  quick!) 

25.  (Use  (17)  repeatedly  with  s = 1.)  A = N — M,  B = 0,  C = (NJ2)  — (M2H2)i 
D = E = S = 0. 

26.  Actually  you  can’t  do  worse  than  to  sort 

12  3 ...  N-M  N N- 1 ...  iV-M+1; 

the  subtler  answer  N M—  1 M— 2 ...  1 M M+ 1 ...  IV— 1 is  an  equally  bad  case. 
This  is  only  a little  worse  than  exercise  25,  because  it  makes  D = M — 1,  E = (')f) . 

27.  12  2 3 1 8 6 7 5 9 10  11  4 16  14  15  13  20  18  19  17  21  22  23,  which  requires  546u.  It 
can  be  shown  that  the  best  case  for  N = 3(M  + l)2k  — 1 occurs  when  the  subfiles  are 
bisected  by  each  partitioning  until  reaching  size  3M  + 2;  then  a trisection  is  performed 
to  avoid  stack-pushing  overhead.  We  have  A = 3 ■ 2k  — 1,  C = (k  + |)(Ar  + 1), 
S — 2k  — 1,  B = D = E = 0.  (The  behavior  of  the  best  case  for  general  M and  N 
makes  an  interesting  but  complex  pattern.) 

28.  The  recurrence 


C„  — n + 1 + Yn\  ~ — k)Ck-i 

U)  fc=1 

can  be  transformed  into 

(g)c„  - 2(n  “ ^Cn-i  + (n  3 2)cn— 2 = 2(n  - 1 )(n  - 2)  + 2 (n  - 2 )C„_2. 

29.  In  general,  consider  the  recurrence 


Cn  — Tl  1 + 


ck. 


which  arises  when  the  median  of  2t  + 1 elements  governs  the  partitioning.  Letting 
C(z)  = Cnzn , the  recurrence  can  be  transformed  to  (1  — z)t+l Cl'2t+1\z) / (2t+2)\  — 
1/(1  - z)t+2  +Cw(z)/{t+  1)! . Let  f(x)  = C(t)(l  - x);  then  p((i9)/(x ) = (2t  + 2)!/xt+2, 
where  d denotes  the  operator  x(d/dx),  and  pt(x)  = (t—x)~  — (2t+2)Ud.  The  general 
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solution  to  (d-a)g(x)  = x13  is  g(x)  = x^/(/3  - a) + Cxa,  for  a /3;  g(x)  = x^lnx  + C) 
for  a = / 3.  We  have  pt(—t  — 2)  = 0;  so  the  general  solution  to  our  differential  equation  is 


C(t\z)  = (2t  + 2)!ln(l  - z)/p't(-t  - 2)(1  - z)t+2  + £c,-(l  - *)"' 

3=0 

where  ao, . . . , at  are  the  roots  of  pt{x)  = 0,  and  the  constants  Ci  depend  on  the  initial 
values  Ct, . . . , C2t • The  handy  identity 

(i  - z)™+i  ln(r^) = Ej 'SHn+™  - H •»)  ("  tT) m-0, 

now  leads  to  the  surprisingly  simple  closed  form  solution 


Cn  = 


Hn 


+1 


H, 


t+i 


H2t+2  — Ht+ 1 


("  + 1)  + jEci(“a»)r 


3=0 


from  which  the  asymptotic  formula  is  easily  deduced.  (The  leading  term  n\nn/ 
( H2t+2  — Ht+ i)  was  discovered  by  M.  H.  van  Emden  [CACM  13  (1970),  563-567] 
using  an  information-theoretic  approach.  In  fact,  suppose  we  wish  to  analyze  any 
partitioning  process  such  that  the  left  subfile  contains  at  most  xN  elements  with 
asymptotic  probability  f*  f(x)  dx,  as  N ->  oo,  for  0 < x < 1;  van  Emden  proved  that 
the  average  number  of  comparisons  required  to  sort  the  file  completely  is  asymptotic 
to  a~xn\nn , where  a = —1/ fx(f(x)  + f(l  — x))xlnxdx.  This  formula  applies  to 
radix  exchange  as  well  as  to  quicksort  and  various  other  methods.  See  also  H.  Hurwitz, 
CACM  14  (1971),  99-102.) 

30.  Solution  1 (of  historic  interest):  Each  subfile  may  be  identified  by  four  quantities 
( l,r,k,X ),  where  l and  r are  the  boundaries  (as  presently),  k indicates  the  number 
of  words  of  the  keys  that  are  known  to  be  equal  throughout  the  subfile,  and  X is  a 
lower  bound  for  the  (k  + l)st  words  of  the  key.  Assuming  nonnegative  keys,  we  have 
(l,r,k,X)  = (1,77,0,0)  initially.  When  partitioning  a file,  we  let  K be  the  (k  + l)st 
word  of  the  test  key  Kq.  If  K > X,  partitioning  takes  place  with  all  keys  > K at 
the  right  and  all  keys  < K at  the  left  (looking  only  at  the  ( k + l)st  word  of  the  key 
each  time);  the  partitioned  subfiles  get  the  respective  identifications  (l,j—l,k,X)  and 
( j,r,k,K ).  But  if  K = X,  partitioning  takes  place  with  all  keys  > K at  the  right 
and  all  keys  < K [actually  = K]  at  the  left;  the  partitioned  subfiles  get  the  respective 
identifications  ( l,j,k  + 1,0)  and  (j  + 1 ,r,k,K).  In  both  cases  we  are  unsure  that  Rj 
is  in  its  final  position  since  we  haven’t  looked  at  the  ( k + 2)nd  words.  Obvious  further 
changes  are  made  to  handle  boundary  conditions  properly.  By  adding  a fifth  “upper 
bound”  component,  the  method  could  be  made  symmetrical  between  left  and  right. 

Solution  2,  by  Bentley  and  Sedgewick  [SODA  8 (1997),  360-369]:  In  a subfile 
identified  by  (l,  r,  k),  let  K be  word  k + 1 of  Kq  as  in  solution  1,  but  use  the  algorithm 
of  exercise  41  to  tripartition  the  subfile  into  (l,  i — 1,  k),  ( i,j,k  + 1),  (j  + 1,  r,  k)  for 
the  cases  <K,  =K , > K . This  approach,  which  the  authors  call  multikey  quicksort, 
is  significantly  better  than  solution  1,  and  it  is  competitive  with  the  fastest  known 
methods  for  sorting  strings  of  characters. 

31.  Go  through  a normal  partitioning  process,  with  Ri  finally  falling  into  position  Rs. 
If  s = m,  stop;  if  s < m,  use  the  same  technique  to  find  the  ( m — s)th  smallest  element 
of  the  right-hand  subfile;  and  if  s > m,  find  the  mth  smallest  element  of  the  left-hand 
subfile.  [CACM  4 (1961),  321-322;  14  (1971),  39-45.] 
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R.  G.  Dromey  [Software  Practice  & Experience  16  (1986),  981-986]  has  observed 
that  fewer  comparisons  and  exchanges  are  needed  if  we  stop  each  partitioning  stage  as 
soon  as  i or  j has  reached  position  m. 

32.  The  recurrence  is  Cn  = 0 and  Cnm  = n + 1 + (Anm  + Bnm)/n  for  n > 1,  where 

Anm  ^ ^ ^(n  — s)(m  — s)  and  Bnm  — ^ ] C*(s_ l)m, 

l<s<m  m<a<n 

for  1 < m < n.  Since  A(n+1)(m+1j  = Anm  + Cnm  and  B(n+1)m  = Bnm  + Cnm,  we  can 
first  find  a formula  for  the  quantity  Dn  = (n+  l)C'(n+1)(m+1) -nC„m,  then  sum  this  to 
obtain  the  answer  2((n+l)ffn-(n+2-m)J/n+1_m-(rn+l)i7m+n+|)-i5mn-|<5ml- 
When  n = 2m -1,  it  becomes  4m(ft2m-i  - Hm)  +4m -4Hm  + f (1  - Sml)  = 
(4  + 41n2)m  - 41nm  - 47  - | + 0(m_1)  « 3.39n.  [See  D.  E.  Knuth,  Proc.  IFIP 
Congress  (1971),  19-27.] 

Another  solution  follows  from  the  theory  of  Section  6.2.2:  Suppose  the  keys  are 
{1, 2, . . . , n},  and  let  Xjk  be  the  number  of  common  ancestors  of  nodes  j and  k in  the 
binary  search  tree  corresponding  to  quicksort.  Then  the  number  of  comparisons  made 
by  the  algorithm  of  exercise  31  can  be  shown  to  be  ]T"=1  Xjm  + Xmm  - 2 [node  m 
is  a leaf].  The  probability  that  node  i is  a common  ancestor  of  nodes  j and  k in  a 
random  binary  search  tree  is  l/(max(i,  j,  k)  - min(i,  j,  k)  + l).  We  obtain  the  average 
number  of  comparisons  from  the  facts  that  EX,*,  = Hk  + Hn+i-j  + 1 - 2 Hk-j+i  for 
1 < j < k,  and  Pr(node  m is  a leaf)  = Pr(m  isn’t  followed  by  m ± 1 in  a random 
permutation)  = § + ±<5mi  + \ 5mn  + |<5mi5mn.  [See  R.  Raman,  SIGACT  News  25,2 
(June  1994),  86-89.] 

For  an  analysis  of  a similar  selection  algorithm  that  uses  median-of-three  parti- 
tioning, see  Kirschenhofer,  Prodinger,  and  Martinez,  Random  Structures  & Algorithms 
10  (1997),  143-156.  Asymptotically  faster  methods  are  discussed  in  exercise  5.3.3-24. 

33.  Proceed  as  in  the  first  stage  of  radix  exchange,  using  the  sign  instead  of  bit  1. 

34.  We  can  avoid  testing  whether  or  not  i < j,  as  soon  as  we  have  found  at  least  one 
0 bit  and  at  least  one  1 bit  in  each  stage  — that  is,  after  making  the  first  exchange  in 
each  stage.  This  saves  approximately  2 C units  of  time  in  Program  R. 

35.  A = N - 1,  B = (min  0,  ave  jiVlgiV,  max  ±NlgN),  C = N\gN,  G = |1V, 
K = L = R = 0,  S = | N — 1,  X = (min  0,  ave  |(1V  — 1),  max  N — 1).  In  general,  the 
quantities  A,  C,  G,  K,  L,  R,  and  S depend  only  on  the  set  of  keys  in  the  file,  not  on 
their  initial  order;  only  B and  X are  influenced  by  the  initial  order  of  the  keys. 

36.  (a)EO(‘)(-l)%  = = Z = a„.  (b)  (6*0); 

(— <5ni);  ((— l)m<5nm);  ((1  — o)n);  ((^)(— a)m(l  — a)n  m).  (c)  Writing  the  relations  to  be 
proved  as  xn  = yn  = an  + zn,  we  have  y„  = an  + zn  by  part  (a);  also  21~r‘]Tfc>2 (")yfc  = 
Zn,  SO  yn  satisfies  the  same  recurrence  as  xn.  [See  exercises  53  and  6.3-17  for  some 
generalizations  of  this  result.  It  does  not  appear  to  be  easy  to  prove  directly  that 
xn  = an  2n-1/(2n-1  - 1).] 

37.  (Eracm(2»j2-»)  for  an  arbitrary  sequence  of  constants  co,  ci,  C2,  ....  [This 
answer,  although  correct,  does  not  reveal  immediately  that  (l/(n+  1))  and  (n  - S„i) 
are  such  sequences!  Sequences  having  the  form  (a„  + an)  are  always  self-dual.  Notice 
that,  in  terms  of  the  generating  function  A{z)  = J 2anzn/n\ , we  have  A(z)  = ezA(-z); 
hence  A = A is  equivalent  to  saying  that  A(z)e~z /2  is  an  even  function.] 
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38.  A partitioning  stage  that  yields  a left  subfile  of  size  s and  a right  subfile  of  size 
N — s makes  the  following  contributions  to  the  total  running  time: 

A = 1,  B = t,  C = N,  K — (5si,  L = (5so>  R — Ssn,  X = h, 

where  t is  the  number  of  keys  Ki,...,Ks  with  bit  b equal  to  1,  and  h is  bit  b of  Ka+ 1; 
if  s — N,  then  h — 0.  (See  (17).)  This  leads  to  recurrence  equations  such  as 

e (;)(w;‘)<‘+b-+b»-> 

0 <t<s<NKZy  ^ 1 ' 

= i(JV-l)  + 2 1~NJ2(Ns)Bs'  for  N > 2;  Bo  = Bi  = 0. 

s> 2 V 

(See  exercise  23.)  Solving  these  recurrences  by  the  method  of  exercise  36  yields  the 
formulas  AN  = Vjv  - UN  + 1,  BN  = \{UN  + N - 1),  Cn  = Vn  + N,  KN  = N/2, 
Ln  = Rn  = |(Vj v — Un  — N)  + 1,  Xn  — \An-  Clearly  Gn  = 0. 

39.  Each  stage  of  quicksort  puts  at  least  one  element  into  its  final  position,  but  this 
need  not  happen  during  radix  exchange  (see  Table  3). 

40.  If  we  switch  to  straight  insertion  whenever  r — l < M in  step  R2,  the  problem 
doesn’t  arise  unless  more  than  M equal  elements  occur.  If  the  latter  is  a likely  prospect, 
we  can  test  whether  or  not  Ki  = ■ ■ ■ = Kr  whenever  j < l or  j = r in  step  R8. 

41.  Lutz  M.  Wegner  [IEEE  Trans.  C-34  (1985),  362-367]  has  discussed  several  ap- 
proaches, of  which  the  following  (as  simplified  by  Bentley  and  Mcllroy  in  Software 
Practice  & Exp.  23  (1993),  1256-1258)  appears  to  be  best  in  practice.  The  basic  idea 
is  to  work  with  the  five-part  array 


= K 

< K 

? 

> K 

= K 

l 

a 

b c 

d 

r 

until  the  middle  part  is  empty,  then  swap  the  two  ends  into  the  middle. 

Dl.  [Initialize.]  Set  a <—  b <—  l,  c <—  d 4—  r. 

D2.  [Increase  b until  Kb  > K.\  If  b < c and  Kb  < K,  increase  b by  1 and  repeat 

this  step.  If  b < c and  Kb  = K,  exchange  Ra  Rb,  increase  a and  b by  1, 

and  repeat  this  step. 

D3.  [Decrease  c until  Kc  < K.]  If  b < c and  Kc  > K,  decrease  c by  1 and  repeat 

this  step.  If  b < c and  Kc  = K , exchange  Rc  Rd,  decrease  c and  d by  1, 

and  repeat  this  step. 

D4.  [Exchange.]  If  b < c,  exchange  Rb  «->  Rc,  increase  b by  1,  decrease  c by  1,  and 
return  to  D2. 

D5.  [Cleanup.]  Exchange  Ri+k  Rc-k  for  0 < k < min(o  — l,  6 — a);  also  exchange 
Rb+k  Rr-k  for  0 < k < min(d  — c, r — d ).  Finally  set  i -4—  l + b — a, 
j <-  r - d + c.  | 

Straightforward  modifications  to  step  Dl  will  handle  degenerate  cases  efficiently 
and  ensure  that  a < b and  c < d before  we  get  to  D2.  Then  the  tests  “6  < c”  in  D2 
and  D3  will  be  unnecessary;  see  exercise  24.  Furthermore,  this  change  will  keep  those 
steps  from  needlessly  exchanging  records  with  themselves. 

One  of  the  main  applications  of  sorting  is  to  bring  records  with  equal  keys  to- 
gether. Therefore  this  tripartitioning  scheme  is  often  preferable  to  the  bipartitioning 
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of  Algorithm  Q.  The  exchanges  in  step  D5  are  efficient  because  all  records  with  keys 
equal  to  K are  now  in  their  final  resting  place. 

This  exercise  is  due  to  W.  H.  J.  Feijen,  who  called  it  the  “Dutch  national  flag 
problem” : Given  a set  of  red,  white,  and  blue  tokens  arranged  randomly  in  a column, 
decide  how  to  swap  pairs  of  tokens  so  that  the  red  ones  will  all  be  at  the  top  and  the 
blue  ones  all  at  the  bottom,  while  looking  at  each  token  only  once  and  using  only  a 
few  auxiliary  variables  to  control  the  process.  [See  E.  W.  Dijkstra,  A Discipline  of 
Programming  (Prentice-Hall,  1976),  Chapter  14.] 

42.  This  is  a special  case  of  a general  theorem  due  to  R.  M.  Karp;  see  JACM  41  (1994), 
1136-1150,  §2.8.  Significantly  sharper  asymptotic  bounds  for  tails  of  the  quicksort 
distribution  have  been  obtained  by  McDiarmid  and  Hayward,  J.  Algorithms  21  (1996), 
476-507. 

43.  As  a -A  0+,  we  have  f3ya~1(e~y  - 1)  dy  + ya~1e~v  dy  = T(a)  - 1/a  = 
(r(a+  1)  — T(l))/a  — > T'(l)  = —7,  by  exercise  1.2.7-24. 

44.  Fork  > 0,  we  have  rk(m)  ~ |(2m)(fc+1)/2r((fc  + l)/2) -8k0  - J2i>o{-iy  Bk+2j+1/ 

((k  + 2 j + 1)/!  (2rn)3).  When  k = —1,  the  contributions  from  ffk3~1>(m)  in  (36) 
cancel  with  similar  terms  in  the  expansion  of  Hm-i,  and  we  have  r_i(m)  = Hm-1  + 
(l/\/2 m)  J2t> 0 /- iW  ~ § (ln(2m)  + 7)  — Bij /{2j)j\{2my . Therefore 

the  contribution  to  Wm- 1 from  the  term  N*/t  of  (33)  is  obtained  from  the  sum 
mEt>  1 f_1  exp(— t2/2m)(l  - t3/3m2  + t6/18m4)(l  - t4/4m3)(l  - t/2m  - t2/8m2)  + 
0(m-!/2)  = |mlnm  + §(ln2  + 7)m-  § + 0(m-1/2).  The  term  N t_1 

contributes  Y.t>i  exp(-f2/2m)(l  - f3/3m2)(l  - t/2m)(l  + t/m)  + 0(m-i/2)  = 

The  term  yields  And  finally  the  term  contributes 

F2m~X  T,t>i  f exp(-t2/2m)  + 0(m-i/2)  = 1 + 0(m-1/2). 

45.  The  argument  used  to  derive  (42)  is  also  valid  for  (43),  except  that  we  leave  out 
the  residues  at  z = — 1 and  2 = 0. 

46.  Proceeding  as  we  did  with  (45),  we  obtain  (s  - 1)!/ In 2 + Ss(n),  where 

2 ^ 

$s(ti)  = J^^(r(s  — 27rifc/ln2)  exp  (27r*fc lgn)). 

z fc>i 


[Note  that  |P(s  + it) |2  = (FIo<fc<s(^2  + t2))n/(tsinhnt),  for  integer  s > 0,  so  we  can 
bound  <5s(n).] 

47.  In  fact,  Ylj>  1 e~n/2J  (n/23)*  equals  the  integral  in  exercise  46,  for  all  s > 0. 

48.  Making  use  of  the  intermediate  identity 


1 — e 


- f 

2m  ]_ 


-1/2+ioo 


1/2— too 


r(2)a:  2 dz, 


we  proceed  as  in  the  text,  with  1 - e~x  playing  the  role  of  e~x  - 1 + x;  Vn+i /{n  + 1)  = 
(— l/2m)  T(*)n  z dz/{ 2~z  — 1)  + 0(n~1),  and  the  integral  equals  lgn  + 

7/ In  2 - | - 60(n)  + O(n_10°)  in  the  notation  of  exercise  46.  [Thus  the  quantity 
AN  in  exercise  38  is  !V(l/ln2  - S0(N  - 1)  - S-i(N))  + 0(1).] 

49.  The  right-hand  side  of  Eq.  (40)  can  be  improved  to  the  estimate  e~x(l  — \x2/n  + 
0((a;3+a;4)n_2)).  The  effect  is  to  subtract  half  the  sum  in  exercise  47,  replacing  0(1) 
in  (50)  by  2 — |(l/ln2  + <5i(n))  + 0(n_1).  (The  “2”  comes  from  the  “2 /n”  in  (46).) 
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50.  Um„  = rclogmn  + n((7— 1)/Inm— !+<5-i(n))+m/(m-l)-l/(21nm)  — |<5i(n)  + 
0(n_1),  with  Ss(n)  as  in  exercise  46  but  replacing  In 2 and  lg  by  lnm  and  logm.  [Note: 
For  m = 2,  3,  4,  5,  10,  100,  1000,  and  106  we  have  <5_i(n)  < .000000172501,  .000041227, 
.0002963,  .0008501433,  .0062704,  .06797,  .1525,  and  .348,  respectively.] 

51.  Let  N = 2m.  We  may  extend  the  sum  (35)  over  all  t > 1,  when  it  equals 

i /»a+too  1 /*a+ioo 

£ / r(z)(t2/N)~ztk  dz=  — T(z)Nza 2z  - k)  dz, 

2™  Ja-ioo  2™  J a — too 

provided  that  a > (k  + l)/2.  So  we  need  to  know  properties  of  the  zeta  function. 
When  %t(w)  > —q,  we  have  ((w)  = 0(|w|?+1)  as  |w|  — > 00;  hence  we  can  shift  the  line 
of  integration  to  the  left  as  far  as  we  please  if  we  only  take  the  residues  into  account.  The 
factor  T(z)  has  poles  at  0,  —1,  —2,  . . . , and  £( 2z  — k)  has  a pole  only  at  z = (k  + l)/2. 
The  residue  at  2 = —j  is  N~i ( — 1)3£(— 2j  — k)/j\,  and  £(— n)  = (— l)nBn+i/(n  + 1). 
The  residue  at  z = (k  + l)/2  is  |r((fc  + l)/2)N^k+1^2.  But  when  k = — 1 there  is  a 
double  pole  at  2 = 0;  and  £(2)  = 1 /(z  — 1)  + 7 + 0(\z  — 1|),  so  the  residue  at  0 in  this 
case  is  7 + | lnfV  — ^7.  We  therefore  obtain  the  asymptotic  series  mentioned  in  the 
answer  to  exercise  44. 

52.  Set  x = t/n;  then 

(n2”  t) / (2jT)  = exP(_2n(x2/l  • 2 + ®4/3  -44 ) 4-  (x2/2  + x4/4  4 ) 

- (l/6n)(x2  - x4  4 ) 4 ); 

the  desired  sum  can  now  be  expressed  in  terms  of  J2t>i  tkd(t)e~t2^n,  for  various  k. 
Proceeding  as  in  exercise  51,  since  ((z)2  = X]«>i  d(t)t~z,  we  wish  to  evaluate  the 
residues  of  T(z)nz<^(2z  — k)2  when  k > 0.  At  z = —j  the  residue  is 

n-j(-iy(B2j+k+1/(2j  + k + l))2/j\, 

and  at  2 = (k  + l)/2  it  is  n(fc+1)/2r((fc  4-  l)/2)(7  4-  \ Inn  4-  \ip{(k  4-  l)/2)),  where 
ip(z)  = T'(z)/T(z)  = Hz- 1 — 7;  thus,  for  example,  when  k = 0,  ]C(>1  e~t2/,nd(f)  = 
\y/nn  Inn  4-  (§7  - § ln2)v/rrn  4-  3 4-  0(n~M)  for  all  M.  For  S„/(2”),  add  Inn  4- 
^74-^  — ^ Iri2)v/7r/n4-0(n_1)  to  this  quantity.  (See  exercises  1.2.7-23  and  1.2.9-19.) 

53.  Let  q = 1 — p.  Generalizing  exercise  36(c),  if 

1 V ^ \ / k n — k , k n—k\ 

Xn  = an  + 2_^yk)  (P  1 +1P  )xk, 


then 


tn=an  + £(;)(- l)k  dk(pk  + ?*)/(!  - Pk  - qk)- 


We  can  therefore  find  Bn  and  Cn  as  before;  the  factor  | in  Bn  should  be  replaced 
by  pq.  The  asymptotic  examination  of  Un  proceeds  essentially  as  in  the  text,  with 


r-  e (:)<■ 

*>1,  8>C 

— [ 

2m  J_ 


— 1 + npsqr~8) 


r>  1,  s>0 

— 3/2+ioo 


r(z)n-z(p-*  + q-*)  dz/(l  - p~z  - q~‘) 

3/2  — ioo 

(n/hp)(lnn  + 7 — 1 4-  h^/2hp  — hp  + S(n))  4-  0(1), 
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where  hp  = — (plnp  + glng),  h ^ = p(lnp)2  + g(lng)2,  and  S(n)  = J2T(z)n~1~z/hp 
summed  over  all  complex  z ^ 1 such  that  p~z  + q~z  = 1.  The  latter  set  of  points 
seems  to  be  difficult  to  analyze  in  general;  but  when  p = cf>~1,  q = <t>~2,  the  solutions 
are  z — (— l)fc+1  + kni/\n  <f>.  The  dominant  term,  (nlnn)/hp,  could  also  have  been 
obtained  from  van  Emden’s  general  formula  quoted  in  the  answer  to  exercise  29.  For 
p = 0"1  we  have  l/hp  « 1.503718,  compared  to  l/hi/2  ~ 1.442695. 

54.  Let  C be  a circle  of  radius  (M+ 1)6,  so  that  the  integral  vanishes  onCasM-too. 
(The  asymptotic  form  of  Un  can  now  be  derived  in  a new  way,  expanding  T(n  + 1)/ 
T(n  + ibm).  The  method  of  this  exercise  applies  to  all  sums  of  the  form 

= 2t^/ B(n+1  ,~z)f(z)dz, 

when  / is  reasonably  well  behaved.  The  latter  formula  can  be  found  in  N.  E.  Norlund’s 
Vorlesungen  iiber  Differenzenrechnung  (Berlin:  Springer,  1924),  §103.) 

55.  Replace  lines  04-06  of  Program  Q by 


ENTA 

0,2 

STA 

INPUT, 3 

c<b<a 

JGE 

5F 

INCA 

0,3 

STX 

INPUT, 2 

CMPX 

INPUT, 4 

a<b,c 

SRB 

1 

5H  LDA 

INPUT, 4 

rA<—b 

JGE 

5B 

STA 

*+1(0:2) 

JMP 

6F 

LDA 

INPUT, 3 

a<c<b 

ENT4 

* 

4H  LDA 

INPUT, 3 

b<c<a 

LDX 

INPUT, 4 

LDA 

INPUT, 2 

rA+-a 

LDX 

INPUT, 2 

STX 

INPUT, 3 

LDX 

INPUT, 3 

rX-t— c 

STX 

INPUT, 3 

JMP 

6F 

CMPA 

INPUT, 3 

JMP 

5F 

5H  LDX 

INPUT, 4 

b<a<c 

JL 

IF 

3H  STX 

INPUT, 2 

c<a<b 

STX 

INPUT, 2 

CMPA 

INPUT, 4 

rA:b 

LDX 

INPUT, 4 

6H  LDX 

INPUT+1,2 

JLE 

3F 

STX 

INPUT, 3 

STX 

INPUT, 4 

CMPX 

INPUT, 4 

rX:6 

JMP 

6F 

ENT4 

2,2 

JG 

4F 

1H  CMPA 

INPUT, 4 

ENT5 

0,3 

followed  by  ‘STA  INPUT+1,2’  (see  the  remark  after  (27));  and  change  the  instruction  in 
line  22  to  ‘STX  INPUT+1,2’.  The  first  three  of  these  instructions  should  be  replaced  by 
‘ENTX  0,2;  INCX  0,3;  ENTA  0;  DIV  =2=’  if  binary  shifting  is  not  available. 

This  program  essentially  exchanges  Ri+ 1 with  R\l(i+t)/2\  and  sorts  the  three  records 
Ri,  R1+1,  Rr,  then  applies  normal  partitioning  to  R1+1 . . . Rr-i-  It  is  tempting  to  save 
a few  lines  of  code  by  simply  putting  the  median  element  in  rA,  moving  Ri  to  the 
median’s  former  place,  and  using  Program  Q as  it  stands.  But  such  an  approach  has 
bad  consequences,  since  it  requires  order  N 2 steps  to  sort  the  file  N N—  1 ...  1.  (This 
amazing  result,  first  noticed  by  D.  B.  Coldrick,  has  to  be  seen  to  be  believed  — try  it!) 
The  technique  recommended  above,  due  to  R.  Sedgewick,  appears  to  be  free  of  such 
simple  worst-case  anomalies,  and  runs  faster  too. 

With  this  median-of-three  partitioning  scheme,  the  algorithm  does  not  look  at 
Kn+ 1,  but  it  still  might  examine  Ko  in  step  Q9. 

56.  We  can  solve  the  recurrence  (j)xn  = bn  + 2 5Z£=1(fc  ~ l)(n  ~ for  n > m, 

by  letting  yn  = nxn,  u„  = ny„+ 1 — (n  + 2 )yn,  vn  = nun+ 1 — (n  — 5 )un;  it  follows  that 
vn  = 6(bn+2  — 26n+i  + bn),  for  n > m.  Example:  Let  xn  = <Sni  for  n < m,  and  let 
bn  = 0.  Then  vn  = 0 for  all  n > m,  hence  n-un+ 1 = m-um+ 1.  Since  ym+ 1 = 12 /m  and 
ym+ 2 = 12/(m+l),  we  ultimately  find  xn  = ^(ra+l)/m(m-|-l)(m-(-2)-|-^(m  — l)-/n-, 
for  n > m.  In  general,  let  fn  = (l2/(n  — l)(n  — 2))  J)^=1(k-l)(n-k)xk-i\  the  solution 
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for  n > m when  bn  is  identically  zero  is 

(m  + l)/m+2  - {m  - 4)/m+i  ((m  + l)/m+2-(m  + 3)/m+i)m- 

Xn  = (n  + 1) 7{m  + l)(m'+2) ' 

When  bn  = (")  /n-  and  x„  = 0 for  n < m,  the  solution  is 

xn  = (p  — 3)(p  — 2)  12  1 12  (m  + 1 - p )g=£ 

n + 1 (p  - 6)(p  + l)(n  + l)E±i  + 7 (p  + l)(m  + 2)H±i  7 (p  - 6)(n  + 1)7  ’ 

for  n > m;  except  that  when  p = — 1 we  have  x„/(n  + 1)  = y-(Hn+ i — Hm+2)  + H + 
7|(m  + 2)7/ (n  + 1)7,  and  when  p = 6,  xn/(n  + 1)  = -y-(Hn- e - Hms)/(n  + 1)-  + 
g/(m  + 2)*+g/(n+l)i 

Arguing  as  in  exercises  21-23,  we  find  that  the  first  partitioning  phase  now  con- 
tributes 1 to  A,  t to  B,  and  IV  — 1 to  C,  where  t is  defined  as  before  but  after 
the  rearrangement  made  in  exercise  55.  Under  the  new  assumptions  we  find  bstN  = 
6(‘!/2)  (,v-t’~1) hence  the  recurrence  stated  above  arises  in  the  following  ways: 


Value 

W(s) 

for  N < M 

for  N > M 

Solution  for  N > M 

An 

0 

1 

(N+l)(A7/(M+2))-l  + 0(7V-6) 

Bn 

0 

(N-4)/5 

(Cjv-3A;v)/5 

Cn 

0 

N — l 

(JV  + l)(f  (HN+1  -%+2)+  § - f/(M+2))  + 2+0(N~6) 

Dn 

N-Hn 

0 

(AT+l)(l-^HM+i/(M+2)-f/(M  + 2))+0(lV-6) 

En 

N(N- 1)/4 

0 

(N+l)(^M-i7  + |/(M+2))  + 0(7V-6) 

Similarly  Sn  = |(JV+  l)(5M  + 3)/(2M-|-3)(2M  + 1)  — 1 + 0(A^-6).  The  total  average 
running  time  of  the  program  in  exercise  55  is  53|Ajv  + 1 1 U.v  + 4C'a-  + 3 Dn  + 8 En  + 
9Sn  + 7N;  the  choice  M = 9 is  very  slightly  better  than  M = 10,  producing  an  average 
time  of  approximately  10|§lVlniV  + 2.1167V  [Acta  Inf.  7 (1977),  336-341].  With  DIV 
instead  of  SRB,  add  HAjv  to  the  average  running  time  and  take  M = 10. 

SECTION  5.2.3 

1.  No;  consider  the  case  K\  > K2  = ■ ■ • = Kn  . But  the  method  using  00  (described 
just  before  Algorithm  S)  is  stable. 

2.  Traversing  a linear  list  stored  sequentially  in  memory  is  often  slightly  faster  if  we 
scan  the  list  from  higher  indices  to  lower,  since  it  is  usually  easier  for  a computer  to 
test  if  an  index  is  zero  than  to  test  if  it  exceeds  N.  (For  the  same  reason,  the  search  in 
step  S2  runs  from  j down  to  1;  but  see  exercise  8!) 

3.  (a)  The  permutation  ai . . . ajv-iTV  occurs  for  inputs 

N a,2  . . . cln— 1 ui,  a\N  as  . . . ajv— 1 <12,  •••,  ni  (12  • • • a/v— 27Vajv— 1,  ai . . . aj\r~ \N. 

(b)  The  average  number  of  times  the  maximum  is  changed  during  the  first  iteration 
of  step  S2  is  Hn  — 1,  as  shown  in  Section  1.2.10.  [Hence  Bn  can  be  found  from 
Eq.  1.2.7— (8).] 

4.  If  the  input  is  a permutation  of  {1, 2, ... , N},  the  number  of  times  i = j in  step  S3 
is  exactly  one  less  than  the  number  of  cycles  in  the  permutation.  (Indeed,  it  is  not 
hard  to  show  that  steps  S2  and  S3  simply  remove  element  j from  its  cycle;  hence  S3  is 
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inactive  only  when  j was  the  smallest  element  in  its  cycle.)  By  Eq.  1.3.3-(2i)  we  could 
save  Hn  — 1 of  the  N — 1 executions  of  step  S3,  on  the  average. 

Thus  it  is  inefficient  to  insert  an  extra  test  “i  = j?”  before  step  S3.  Instead  of 
testing  i versus  j,  however,  we  could  lengthen  the  program  for  S2  slightly,  duplicating 
part  of  the  code,  so  that  S3  never  is  encountered  if  the  initial  guess  Kj  is  not  changed 
during  the  search  for  the  maximum;  this  would  make  Program  S a wee  bit  faster. 

5.  (N  - 1)  + (N  - 3)  + • • ■ = [iV2/4j . 

6.  (a)  If  i # y in  step  S3,  that  step  decreases  the  number  of  inversions  by  2 m — 1, 
where  m is  one  more  than  the  number  of  keys  in  Ki+ 1 ...Kj-i  that  lie  between  Kt 
and  Kj\  clearly  m is  not  less  than  the  contribution  to  B on  the  previous  step  S2. 
Now  apply  the  observation  of  exercise  4,  connecting  cycles  to  the  condition  i = j. 
(b)  Every  permutation  can  be  obtained  from  N...2  1 by  successive  interchanges  of 
adjacent  elements  that  are  out  of  order.  (Apply,  in  reverse  sequence,  the  interchanges 
that  sort  the  permutation  into  decreasing  order.)  Every  such  operation  decreases  I by 
one  and  changes  C by  ±1.  Hence  no  permutation  has  a value  of  I — C exceeding  the 
corresponding  value  for  N. . . 2 1.  [By  exercise  5 the  inequality  B < [N2/ 4J  is  best 
possible.] 

7.  A.  C.  Yao,  “On  straight  selection  sort,”  Computer  Science  Technical  Report  185 
(Princeton  University,  1988),  showed  that  the  variance  is  aN1  5 + 0(N1A9&  log  AT), 
where  a = | In  | « 0.9129;  he  also  conjectured  that  the  actual  error  term  is 
significantly  smaller. 

8.  We  can  start  the  next  iteration  of  step  S2  at  position  Ki,  provided  that  we  have 
remembered  max  (K i, . . . , Ki- 1).  One  way  to  keep  all  of  this  auxiliary  information  is 
to  use  a link  table  Li . . . Ln  such  that  Kik  is  the  previous  boldface  element  whenever 
Kk  is  boldface;  L\  — 0.  [We  could  also  get  by  with  less  auxiliary  storage,  at  the  expense 
of  some  redundant  comparisons.] 

The  following  MIX  program  uses  address  modification  so  that  the  inner  loop  is  fast. 


rll 

= j,  rI2 

= k - 

-j,  rI3  = i, 

*1 

> 

III 

* 

01 

START 

ENT1 

N 

1 

j <—  N. 

02 

STZ 

LINK+1 

1 

03 

JMP 

9F 

1 

04 

1H 

ST1 

6F(0:2) 

N-D 

Modify  addresses  in 

loop. 

05 

ENT4 

INPUT ,1 

N-D 

06 

ST4 

7F(0:2) 

N-D 

07 

ENT4 

LINK, 1 

N-D 

08 

ST4 

8F(0:2) 

N-D 

09 

7H 

CMPA 

INPUT+ J , 2 

A 

[Address  modified] 

10 

JGE 

*+4 

A 

Jump  if  Ki  > Kk- 

11 

8H 

ST3 

LINK+J.2 

N+l-C 

Otherwise  Lk  <—  i, 

[Address  modified] 

12 

6H 

ENT3 

J,2 

N+l-C 

i <—  k. 

[Address  modified] 

13 

LDA 

INPUT, 3 

N + l-C 

14 

INC2 

1 

A 

k <—  k + 1. 

15 

J2NP 

7B 

A 

Jump  if  k < j. 

16 

4H 

LDX 

INPUT, 1 

N 

17 

STX 

INPUT, 3 

N 

Ri  i — Rj  . 

18 

STA 

INPUT, 1 

N 

Rj  <—  former  Rt. 

19 

DEC1 

1 

N 

j «-  j - 1. 

20 

ENT2 

0,3 

N 

rI2  ^ — i. 
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21 

LD3  LINK,  3 

TV 

i i — Li . 

22 

J3NZ  5F 

TV 

If  i > 0,  k will  start  at  i. 

23 

9H 

ENT3  1 

C 

Otherwise  *4—1. 

24 

ENT2  2 

c 

k will  start  at  2. 

25 

5H 

DEC2  0,1 

TV  + 1 

26 

LDA  INPUT, 3 

TV  + 1 

rA  4-  Ki. 

27 

J2NP  IB 

TV  + 1 

Jump  if  k < j. 

28 

J1P  4B 

D + 1 

Jump  if  j > 0.  | 

9. 

TV  — 

1 + X^>Jfe>2((^  — 

l)/2  - 1/fc) 

= 1 ((f)  + TV  — Hn-  [The  average  values  of  C 

and  D are,  respectively,  Hn  + 1 and  HN  - §;  hence  the  average  running  time  of  the 
program  is  (1.257V2  + 31.757V  - 15Hjv  + 14.5)  rr.]  Program  H is  much  better. 


087  061  — oo  — oo  — oc  -oo  -oo  -oo 

/\  /\  /\  /\  /\  /\  /\  /\ 


— oo  087  — oo  061  — oo  — oo  — oo  — oo  — oo  — oo  — oo  — oo  — oo  — oo  — oo  — oo 


503  275  653  703 


087  061  170  -oo  426  509  677  -oo 


/\  /\  /\  /\  /\  /\  /\  /\ 

— oo  — oo  — oo  — oo  — oo  — oo  — oo  — oo  — oo  — oo  154  — oo  612  — oo  — oo  — oo 

12.  2n  — 1,  once  for  each  — oo  in  a branch  node. 

13.  If  K > Kr+ i,  then  step  H4  may  go  to  step  H5  if  j = r.  (Step  H5  is  inactive 
unless  Kr  < Kr+ 1,  when  step  H6  will  go  to  H8  anyway.)  To  ensure  that  K > Kr+ 1 
throughout  the  algorithm,  we  may  start  with  Kn+i  < min(/fi, . . . , Kn)',  instead  of 
setting  Rr  4—  R\  in  step  H2,  set  Rr+i  4—  Rn+i  and  Rn+ i 4—  Ri ; also  set  R2  4—  Rn+i 
after  r = 1.  (This  trick  does  not  speed  up  the  algorithm  nor  does  it  make  Program  H 
any  shorter.) 

14.  When  inserting  an  element,  give  it  a key  that  is  less  (or  greater)  than  all  previously 
assigned  keys,  to  get  the  effect  of  a simple  queue  (or  stack,  respectively). 

15.  For  efficiency,  the  following  solution  is  a little  bit  tricky,  avoiding  all  multiples  of  3 
[CACM  10  (1967),  570]. 

PI.  [Initialize.]  Set  p[l]  4—  2,  p[ 2]  4—  3,  k 4—  2,  n 4—  5,  d 4—  2,  r 4—  1,  t 4—  25,  and 
place  (25, 10, 30)  in  the  priority  queue.  (In  this  algorithm,  p[i\  = ith  prime; 
k = number  of  primes  found  so  far;  n = prime  candidate;  d — distance  to 
next  candidate;  r = number  of  elements  in  the  queue;  t = p[r  + 2]2,  the  next 
n for  which  we  should  increase  r.  The  queue  entries  have  the  form  (it,  v,  6p), 
where  p is  a prime  divisor  of  u,  v = 2 p or  4p,  and  u + v is  not  a multiple  of  3.) 
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P2.  [Advance  q.]  Let  (q.  q' , q")  be  a queue  element  with  the  smallest  first  compo- 
nent. Replace  it  in  the  queue  by  ( q + q , q"  — q' ,q").  (This  denotes  the  next 
multiple  of  q"/ 6 that  must  be  excluded.)  If  n > q.  repeat  this  step  until  n < q. 

P3.  [Check  for  prime  n.]  If  n > N,  terminate  the  algorithm.  Otherwise,  if  n < q, 
set  k 4—  k + 1,  p[k]  4-  n,  n 4—  n + d,  d 4—  6 — d,  and  repeat  this  step. 

P4.  [Check  for  prime  y/n.]  (Now  n = q is  not  prime.)  If  n = t,  set  r 4-  r + 1, 
u 4—  p[r  + 2],  t 4—  u2 , and  insert  (t,2u,6u)  or  (f,4u,6«)  into  the  queue 
according  as  u mod  3 = 2 or  u mod  3 = 1. 

P5.  [Advance  n.]  Set  n 4—  n + d,  d 4—  6 — d,  and  return  to  P2.  | 

Thus  the  computation  begins  as  follows: 

Queue  contents  Primes  found 

(25,  10,  30)  5,  7,  11,  13,  17,  19,  23 

(35,  20,  30)(49,  28,  42)  29,  31 

(49,  28,  42)(55,  10,  30)  37,  41,  43,  47 

(55,  10,  30)(77,  14,  42)(121,  22,  66)  53 

If  the  queue  is  maintained  as  a heap,  we  can  find  all  primes  < N in  0(N  log  N log  log  N) 
steps;  the  length  of  the  heap  is  at  most  the  number  of  primes  < %/iV,  and  the  entry 
for  p is  updated  O(Nfp)  times.  The  sieve  of  Eratosthenes,  as  implemented  in  exercise 
4. 5. 4-8,  is  a 0(N  log  log  N)  method  requiring  considerably  more  random  access  storage. 
More  efficient  implementations  are  discussed  in  Section  7.1.3. 

16.  II.  [Make  a new  leaf  j.]  Set  K 4—  key  to  be  inserted;  j <—  n + 1 . 

12.  [Find  parent  of  j.\  Set  i 4-  [j/2] . 

13.  [Done?]  If  i = 0 or  Ki  > K,  set  Kj  4-  K and  terminate  the  algorithm. 

14.  [Sift  and  move  j up.]  Set  Kj  4—  Ki,  j 4—  i,  and  return  to  12.  | 

[T.  Porter  and  I.  Simon  showed  in  IEEE  Trans.  SE-1  (1975),  292-298,  that  if  An+i 
denotes  the  average  number  of  times  step  4 is  executed,  given  a random  heap  of 
uniformly  random  numbers,  we  have  An  = [lgnj  + (1  — n~1)Ani  for  n > 1,  where 
n = (lbi-ibi-2  ■ ■ ■ bo)2  implies  n'  = (lf>;_2  . . . 60)2-  If  l = UsnJ>  value  is  always 

> A2l+i_1  = (2,+1  - 2)/(2'+1  - 1),  and  always  < A2i  < a,  where  a is  the  constant 

in  (19).] 

17.  The  file  12  3 goes  into  the  heap  3 2 1 with  Algorithm  H,  but  into  3 12  with 
exercise  16.  [Note:  The  latter  method  of  heap  creation  has  a worst  case  of  order 
N log  N\  but  empirical  tests  have  shown  that  the  average  number  of  iterations  of  step  2 
during  the  creation  of  a heap  is  less  than  about  2.28N,  for  random  input.  R.  Hayward 
and  C.  McDiarmid  [J.  Algorithms  12  (1991),  126-153]  have  proved  rigorously  that  the 
constant  of  proportionality  lies  between  2.2778  and  2.2994.] 

18.  Delete  step  H6,  and  replace  H8  by: 

H8'.  [Move  back  up.]  Set  j 4—  i,  i 4—  [j/2j. 

H9'.  [Does  K fit?]  If  K < Ki  or  j = l,  set  R3  4—  R and  return  to  H2.  Otherwise 

set  Rj  4 — Rt  and  return  to  H8'.  | 

The  method  is  essentially  the  same  as  in  exercise  16,  but  with  a different  starting  place 
in  the  heap.  The  net  change  to  the  file  is  the  same  as  in  Algorithm  H.  Empirical  tests 
on  this  method  show  that  the  number  of  times  Rj  4—  R,  occurs  per  siftup  during  the 
selection  phase  is  (0,1,2)  with  respective  probabilities  (.837,  .135,  .016).  This  method 


5.2.3 


ANSWERS  TO  EXERCISES  643 


makes  Program  H somewhat  longer  but  improves  its  asymptotic  speed  to  (13iVlg  N + 
0(N))u.  A MIX  instruction  to  halve  the  value  of  an  index  register  would  be  desirable. 

C.  J.  H.  McDiarmid  and  B.  A.  Reed  [J.  Algorithms  10  (1989),  352-365]  have 
proved  that  this  modification  also  saves  an  average  of  (3/3  — 8 )N  fts  0.232 N comparisons 
during  the  heap-creation  phase,  where  (3  is  defined  in  the  answer  to  exercise  27.  For 
further  analysis  of  Floyd’s  improvement,  see  I.  Wegener,  Theoretical  Comp.  Sci.  118 
(1993),  81-98. 

J.  Wu  and  H.  Zhu  [J.  Comp.  Sci.  and  Tech.  9 (1994),  261-266]  have  observed  that 
binary  search  can  also  be  used,  so  that  each  siftup  of  the  selection  phase  involves  at 
most  lg  N + lg  lg  N comparisons  and  lg  N moves. 

19.  Proceed  as  in  the  revised  siftup  algorithm  of  exercise  18,  with  K = Kn,  l = 1,  and 
r = N — 1,  starting  with  a given  value  of  j in  step  H3. 

20.  For  0 < k < n,  the  number  of  positive  integers  < N whose  binary  representation 
has  the  form  (bn  . . . bka\ . . . aq)2  for  some  q > 0 is  clearly  (bk-i  ■ ■ ■ feo)2+l+X]o<g<k  29  = 
(l&fc-i  • • • &o)2- 

21.  Let  j = (cr  ...  Co)  2 be  in  the  range  [N/  2fc+1J  = (b„  . . . bk+ 1)2  < j < (b„  . . . bk)  2 = 
[N/ 2k\.  Then  Sj  is  the  number  of  positive  integers  < N whose  binary  representation 
has  the  form  (cr  . . . CoOi . . . aq)2  for  some  q > 0,  namely  ^20<q<k  2q  = 2k+1  — 1.  Hence 
the  number  of  nonspecial  subtrees  of  size  2k+1  — 1 is 

[N/2k\  - \_N/2k+1\  - 1 = [(N  - 2fc)/2fc+1J . 

[To  prove  the  latter  identity,  use  the  replicative  law  in  exercise  1.2.4-38  with  n = 2 and 
x = N/2k+1.] 

22.  The  five  possibilities  before  l = 1 are  53412,  35412,  43512,  1543  2,  and 
2 5413.  Each  of  these  possibilities  0111203 <14 as  leads  to  three  possible  permutations 
01O2O3O4O5,  O1O4O3O2O5,  O1O5O3O4O2  before  1 = 2. 

23.  (a)  After  B iterations,  j > 2 Bl\  hence  2 Bl  < r.  (b)  We  have  Yn=i  Ll°g2  (TV/ /) J = 

( LTV/2  j - LJV/4J)  + 2(  [TV/4J  - (N/8\)  + 3(LTV/8j  - [TV/16J)  + • • • = LJV/2]  + [JV/4J  + 
[7V/8]  +•  • • = N — v(N),  where  v(N)  is  the  number  of  ones  in  the  binary  representation 
of  N.  Also  by  exercise  1.2.4-42  we  have  [lgr]  = N[lg  N\  — 2^lg  N^+1+2.  We  know 

by  Theorem  H that  this  upper  bound  on  B is  best  possible  during  the  heap-creation 
phase.  Furthermore  it  is  interesting  to  note  that  there  is  a unique  heap  containing  the 
keys  {1,2,...,  N}  such  that  K is  identically  equal  to  1 throughout  the  selection  phase 
of  Algorithm  H.  (For  example,  when  N = 7 that  heap  is  7 5 6 2 4 3 1;  it  is  not  difficult 
to  pass  from  N to  N + 1.)  This  heap  gives  the  maximum  value  of  B (as  well  as  the 
maximum  value  \N/2]  — 1 of  D)  for  the  selection  phase  of  heapsort,  so  the  best  possible 
upper  bound  on  B for  the  entire  sort  is  N — v(N)  + N [lg  N\  — 2'-lg  N'+1  + 2. 

24.  J2k=i  LlgfcJ2  = (N+l-  2 n)n2  + Zo<k<nk22k  = (N  + 1 )n2  - (2  n - 3)2"+1  - 6, 
where  n = [lg  jVJ  (see  exercise  4.5.2-22);  hence  the  variance  of  the  last  siftup  is  0n  = 
((AT  + l)n2  - (2 n - 3)2n+1  - 6)/N  - ((N  + T)n  + 2 - 2 n+1f/N2  = 0(1).  The  standard 
deviation  of  B'N  is  (^2{0S  \ s £ Mn})1^2  = O(VN). 

25.  The  siftup  is  “uniform,”  and  each  comparison  Kj-.Kj+i  has  probability  | of 
coming  out  < . The  average  contribution  to  C in  this  case  is  just  one-half  the  sum 
of  the  average  contributions  to  A and  B.  namely  ((2n  — l)2n_1  + |)/ (2n+1  — 1). 

26.  (a)  (§§ + ± + If + ± + l§+ If +2i  + § + l§  + l|+ 2§ + 14  + 2 + 2 + 3 + 0 + 
1 + 1 + 2 + 1 + 2 + 2 + 3 + 1 + 2 + 2)/26  = 1189/780  » 1.524. 
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(b)  (Y,k=i1'(k)-N+2lN/2\-ln  + 'El=l  min(afc_i,  ak-ak-X-l)/(ak-\))/N, 
where  n(k)  is  the  number  of  one  bits  in  the  binary  representation  of  k,  and  ak  = 
(1  bk  . . . bo)2 ■ If  N = 2ei  + 262  + • • • + 2et,  with  ei  > > • • • > et  > 0,  it  can  be 

shown  that  XlitLo  v(k)  = |((ei  + 2)2ei  + (e2  + 4)2e2  + • • • + (et  + 2t)2e‘)  + t - N. 
[The  asymptotic  properties  of  such  sums  can  be  analyzed  perspicuously  with  the 
help  of  Mellin  transforms;  see  Flajolet,  Grabner,  Kirschenhofer,  Prodinger,  and  Tichy, 
Theoretical  Comp.  Sci.  123  (1994),  291-314.] 

27.  J.  W.  Wrench,  Jr.  has  observed  that  the  general  Lambert  series  5Zn>i  anXn/(l—xn) 
can  be  expanded  as  ]C;v>i(I3d\jv  ad)xN  = £m>i(a™  + T,k>i(am  + am+k)xkm)  x™2 . 

[The  cases  a„  = 1 and  an  = n were  introduced  by  J.  H.  Lambert  in  his  Anlage 
zur  Architectonic  2 (Riga:  1771),  §875;  Clausen  stated  his  formula  for  the  case  a„  = 1 
in  Crelle  3 (1828),  95,  and  H.  F.  Scherk  presented  a proof  in  Crelle  9 (1832),  162-163. 
When  an  = n and  x = | we  obtain  the  relation 


= 2.74403  38887  59488  36048  02148  91492  27216  43114+; 

this  constant  arises  in  (20),  where  we  have  B'N  ~ (/3  — 2 )N  and  C'N  ~ (|/3- i)A7.] 
Incidentally,  if  we  set  q — x and  z = xy  in  the  first  identity  of  exercise  5.1.1-16, 
then  evaluate  Jb  at  y = 1,  we  get  the  interesting  identity 


E 


x 


n 


1 — Xn 


1 - xk+1){l  - xk+2) . . . . 

k>i 


28.  The  children  of  node  k are  nodes  3k  — 1,  3fc,  and  3fc  + l;  the  parent  is  [(fc  + l)/3j.  A 
MIX  program  analogous  to  Program  H takes  asymptotically  21 1 /V  log  N w 13.7iV  lg  N 
units  of  time.  Using  the  idea  of  exercise  18  lowers  this  to  18^N  \og3  N « 11.81V  lg  N, 
although  the  division  by  3 will  add  a large  0(1V)  term. 

For  further  information  about  t-ary  heaps,  see  S.  Okoma,  Lecture  Notes  in  Comp. 
Sci.  88  (1980),  439-451. 

30.  Suppose  n = 2*  — 1+r,  where  t = [lg nj  and  1 < r < 2* . Then  /12m  = [m  = 0]  and 

t-2 

b(n+l)m  < ^ ' (2  l)^n(m—  j)  "b  2 ^n(m— 1+1)  A — t)  for  Tl  > 2, 

1=0 

by  considering  the  number  of  elements  on  level  j that  could  be  the  final  resting  place 
of  Kn+ 1 after  it  has  been  sifted  up  in  place  of  Kx.  Therefore,  if  gnm  = hnrn/ 2m,  we 
have 

2j  — 1 r 

9(n+l)m  — / v ^ “ 9n(m—j)  ~b  9n(m  — t+1)  *b  Qn(m  — t)  0§(^  T 1))  ma X , 

— * ZJ  Z m>  0 

1=0 

and  it  follows  by  induction  that  gnrn.  < Ln  = 111=2  lg&- 

The  average  total  number  of  promotions  during  the  selection  phase  is  B'^  = 
h^1  yZm>o  mhNm,  where  hn  = Yhm>o  h-Nm  is  the  total  number  of  possible  heaps 
(Theorem  H).  We  know  that  B'^  < AT[lg  N] . On  the  other  hand,  we  have  B'pj  > 
m — hjif1  — k)hffk  > m-h^Lx  Yl’k=i(m~  k)2k  > m — 2m+1  h^1  Ln , for  all  m. 

Choosing  m = lg(/ijv/Lw)  + 0(1)  now  gives  B'x  > lg (Hn/Ln)  + 0(1). 
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The  number  of  comparisons  needed  to  create  a heap  is  at  most  2 N,  by  exercise 
23(b);  hence  Hn  > N\/22N.  Clearly  Ljv  < (lg N)N , so  we  have  lg(hjv/Ljv)  > NlgN  — 
N lglg N + 0(N).  [J.  Algorithms  15  (1993),  76-100.] 

31.  (Solution  by  J.  Edighoffer,  1981.)  Let  A be  an  array  of  2 n elements  such  that 
A[2[»/2J]  < A[2i\  and  A[2[i/2J  — l]  > A[2i  — 1]  for  1 < i < n;  furthermore  we  require 
that  A[2i  — 1]  > A[2i]  for  1 < i < n.  (The  latter  condition  holds  for  all  i if  and  only 
if  it  holds  for  n/2  < i < n,  because  of  the  heap  structure.)  This  “twin  heap”  contains 
2 n elements;  to  handle  an  odd  number  of  elements,  we  simply  keep  one  element  off 
to  the  side.  Appropriate  modifications  of  the  other  algorithms  in  this  section  can  be 
used  to  maintain  twin  heaps,  and  it  is  interesting  to  work  out  the  details.  This  idea 
was  independently  discovered  and  developed  further  by  J.  van  Leeuwen  and  D.  Wood 
[Comp.  J.  36  (1993),  209-216],  who  called  the  structure  an  “interval  heap.” 

32.  In  any  heap  of  N distinct  elements,  the  largest  m = [ N/2]  elements  form  a subtree. 

At  least  [m/2j  of  them  must  be  nonleaves  of  that  subtree,  since  a binary  tree  with  k 
leaves  has  at  least  k — 1 nonleaves.  Therefore  at  least  [m/2j  of  the  largest  m elements 
appear  in  the  first  [N/2\  positions  of  the  heap.  Those  elements  must  be  promoted  to 
the  root  position  before  reaching  their  final  destinations;  so  their  movement  contributes 
at  least  (lg  k\  = |mlgm  + O(m)  to  B,  by  exercise  1.2.4-42.  Thus  Bmin(N)  > 

4 IV lg  N + 0(N)  + Bmm{[N/2\),  and  the  result  follows  by  induction  on  N.  [I.  Wegener, 
Theoretical  Comp.  Sc i.  118  (1993),  81-98,  Theorem  5.1.  Schaffer  and  Sedgewick,  and 
independently  Bollobas,  Fenner,  and  Frieze,  have  constructed  permutations  that  require 
no  more  than  |jV  lg  N + 0(N  log  log  N)  promotions;  see  J.  Algorithms  15  (1993),  76- 
100;  20  (1996),  205-217.  Such  permutations  are  quite  rare,  by  the  result  of  exercise  30.] 

33.  Let  P and  Q point  to  the  given  priority  queues.  The  following  algorithm  uses  the 
convention  DIST(A)  = 0,  as  in  the  text,  although  A isn’t  really  a node. 

Ml.  [Initialize.]  Set  R 4—  A. 

M2.  [List  merge.]  If  Q = A,  set  D 4—  DIST(P)  and  go  to  M3.  If  P = A,  set 
P <—  Q,  D 4—  DIST(P),  and  go  to  M3.  Otherwise  if  KEY(P)  > KEY(Q),  set 
T 4-  RIGHT (P) , RIGHT (P)  4-  R,  R 4-  P,  P 4-  T and  repeat  step  M2.  If 
KEY (P)  < KEY(Q),  set  T 4-  RIGHT (Q),  RIGHT (Q)  4-  R,  R 4-  Q,  Q 4-  T and 
repeat  step  M2.  (This  step  essentially  merges  the  two  “right  lists”  of  the 
given  trees,  temporarily  inserting  upward  pointers  into  the  RIGHT  fields.) 
M3.  [Done?]  If  R = A,  terminate  the  algorithm;  P points  to  the  answer. 

M4.  [Fix  DISTs.]  Set  Q 4-  RIGHT (R).  If  DIST(LEFTCR) ) < D,  then  set  D 4- 
DIST(LEFTCR))  + 1,  RIGHT  (R)  <-  LEFT(R) , LEFT  (R)  <-  P;  otherwise  set 
D 4-  D + 1,  RIGHT (R)  4-  P.  Finally  set  DIST(R)  4-  D,  P 4-  R,  R 4-  Q, 
and  return  to  M3.  | 

34.  Starting  with  the  recurrence 


(m  — 1 

Hz)  - Lk(z) 

k= 1 


for  parts  of  the  overall  generating  function  L(z)  = Yln> o = ^m>i  Lm(z),  where 
Lm(z)  = z2’n~1  + generates  leftist  trees  with  shortest  path  length  m from  root 
to  A,  Rainer  Kemp  has  proved  that  L(z)  = z + \L(z)2  + §Sm>i  Lm(z)2 , and  that 
o « 0.25036  and  b « 2.7494879  [Inf.  Proc.  Letters  25  (1987),  227-232;  Random 
Graphs  ’87  (1990),  103-130].  Luis  Trabb  Pardo  noticed  in  1978  that  the  generating 
function  G(z ) = zL(z)  satisfies  the  elegant  relation  G(z)  — z + G(zG(z)). 
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35.  Let  the  DIST  field  of  the  deleted  node  be  do,  and  let  the  DIST  field  of  the  merged 
subtrees  be  d\.  If  d,0  = di,  we  need  not  go  up  at  all.  If  do  > d\ , then  di  = dQ  — 1;  and 
if  we  go  up  n levels,  the  new  DIST  fields  of  the  ancestors  of  P must  be,  respectively, 
d\  + 1,  di  + 2,  . . . , d\  + n.  If  do  < di,  the  upward  path  must  go  only  leftwards. 

36.  Instead  of  a general  priority  queue,  it  is  simplest  to  use  a doubly  linked  list;  move 
nodes  to  one  end  of  the  list  whenever  they  are  used,  and  delete  nodes  from  the  other 
end.  [See  the  discussion  of  self-organizing  files  in  Section  6.1.] 

37.  In  an  infinite  heap,  the  fcth-largest  element  is  equally  likely  to  appear  in  the  left  or 
the  right  subheap  of  its  larger  ancestors.  Thus  we  can  use  the  theory  of  digital  search 
trees,  obtaining  e(k)  — Ck  - Ck- i in  the  notation  of  Eq.  6.3-(i3).  By  exercise  6.3-28 
we  have  e(k)  = lg  k + j/ (In  2)  + | — a + (5o(fc)  +0(fc"'1)  « lg  k — .274,  where  a is  defined 
in  (19)  and  So(k)  is  a periodic  function  of  lg  k.  [P.  V.  Poblete,  BIT  33  (1993),  411-412.] 

38.  Mo  = 0;  Mi  = {1};  Mn  = {N}  l±l  M2k_1  l±l  MN_2k  for  N > 1,  where  k = 
Llg(2iV/3)J. 


SECTION  5.2.4 

1.  Start  with  n = • • • = ik  = 1,  j = 1.  Repeatedly  find  min^iq, . . . ,xkik)  = xTir, 
and  set  Zj  = xrir , j «—  j + 1,  ir  «—  ir  + 1.  (In  this  case  the  use  of  +1)  = 00  is  a 
decided  convenience.) 

When  k is  moderately  large,  it  is  desirable  to  keep  the  keys  , . . . , xklk  in  a tree 
structure  suited  to  repeated  selection,  as  discussed  in  Section  5.2.3,  so  that  only  ]_lg  fcj 
comparisons  are  needed  to  find  the  new  minimum  each  time  after  the  first.  Indeed,  this 
is  a typical  application  of  the  principle  of  “smallest  in,  first  out”  in  a priority  queue. 
The  keys  can  be  maintained  as  a heap,  and  00  can  be  avoided  entirely.  See  the  further 
discussion  in  Section  5.4.1. 


2.  Let  C be  the  number  of  comparisons;  we  have  C = m + n — S,  where  S is  the 
number  of  elements  transmitted  in  step  M4  or  M6.  The  probability  that  S > s is  easily 
seen  to  be 


qs  = 


(r+;-‘M"+r*)) 


for  1 < s < m + n;  qa  = 0 for  s > m + n.  Hence  the  mean  of  S is  p m„  = q\  + <72  + ■ • ■ = 
m/(n  + 1)  + n/(m  + 1)  [see  exercises  3.4. 2-5,  6],  and  the  variance  is  a2mn  = (q\  + 3 q2  + 
5q3_| )~Hmn  = Jn(2m  + n)/(n  + l)(n  + 2)  + (m  + 2n)n/(m  + l)(m  + 2)-p^n.  Thus 


C = (min  min(m,n),  avem  + ti  - pmn,  maxra  + ti-1,  devcrmn). 


When  m = n the  average  was  first  computed  by  H.  Nagler,  CACM  3 (1960),  618-620; 
it  is  asymptotically  2n  — 2 + 0(n-1),  with  a standard  deviation  of  \fl  + 0(n-1).  Thus 
C hovers  close  to  its  maximum  value. 

3.  M2'.  If  Ki  < K'j,  go  to  M3';  if  = R',  go  to  M7';  if  Kt  > K\,  go  to  M5'. 

M7'.  Set  Kk  <—  K'j , k <—  k + 1,  i ■<—  i + 1,  j 4—  j + 1.  If  i > M,  go  to  M4';  otherwise 
if  j > N,  go  to  M6';  otherwise  return  to  M2'.  | 

(Appropriate  modifications  are  made  to  other  steps  of  Algorithm  M.  Again  many 
special  cases  disappear  if  we  insert  artificial  keys  KM+1  = K'n+i  = 00  at  the  end  of 
the  files.) 

4.  The  sequence  of  elements  that  appears  at  a fixed  internal  node  of  the  selection 
tree,  as  time  passes,  is  obtained  by  merging  the  sequences  of  elements  that  appear  at 
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the  children  of  that  node.  (The  discussion  in  Section  5.2.3  is  based  on  selecting  the 
largest  element,  but  it  could  equally  well  have  reversed  the  order.)  So  the  operations 
involved  in  tree  selection  are  essentially  the  same  as  those  involved  in  merging,  but 
they  are  performed  in  a different  sequence  and  using  different  data  structures. 

Another  relation  between  merging  and  tree  selection  is  indicated  in  exercise  1. 
Note  that  an  N- way  merge  of  one-element  files  is  a selection  sort;  compare  also  four- 
way merging  of  (A,B,C,D)  to  two-way  merging  of  (A,B),  ( C,D ),  then  ( AB,CD ). 

5.  In  step  N6  we  always  have  Ki  < Ki-i  < Kj ; in  N10,  Kj  < Kj+i  < Ki. 

6.  For  example,  2 6 4 10  8 14  12  16  15  11  13  7 9 3 5 1;  after  one  pass,  two  of  the 
expected  stepdowns  disappear:  1 2 5 6 7 8 13  14  16  15  12  11  10  9 4 3.  This  possibility 
was  first  noted  by  D.  A.  Bell,  Comp.  J.  1 (1958),  74.  Quirks  like  this  make  it  almost 
hopeless  to  carry  out  a precise  analysis  of  Algorithm  N. 

7.  [lgAT],  if  N > 1.  (Consider  how  many  times  p must  be  doubled  until  it  is  > N.) 

8.  If  N is  not  a multiple  of  2 p,  there  is  one  short  run  on  the  pass,  and  it  is  always 
near  the  middle;  letting  its  length  be  t,  we  have  0 < t < p.  Step  S12  handles  the  cases 
where  the  short  run  is  to  be  “merged”  with  an  empty  run,  or  where  t = 0;  otherwise 
we  have  essentially  x\  < X2  < • • • < xp  \ yt  > • • • > j/i.  If  xp  < yt,  the  left-hand  run  is 
exhausted  first,  and  step  S6  will  take  us  to  S13  after  xp  has  been  transmitted.  On  the 
other  hand,  if  xp  > yt,  the  right-hand  side  will  be  artificially  exhausted,  but  Kj  = xp 
will  never  be  < Ki  in  step  S3!  Thus  S6  will  eventually  take  us  to  S13  in  all  cases. 

10.  For  example,  Algorithm  M can  merge  elements  Xj+i  . . . Xj+m  with  Xj+m+i  ■ ■ ■ 
Xj+m+n  into  positions  x\  . . . xm+n  of  an  array  without  conflict,  if  j > n.  With  care  we 
can  exploit  this  idea  so  that  iV  + 2 ^lg  ~ 1 locations  are  required  for  an  entire  sort.  But 
the  program  seems  to  be  rather  complicated  compared  to  Algorithm  S.  [Comp.  J.  1 
(1958),  75;  see  also  L.  S.  Lozinskii,  Kibernetika  1,3  (1965),  58-62.] 

11.  Yes.  This  can  be  seen,  for  example,  by  considering  the  relation  to  tree  selection 
mentioned  in  exercise  4.  But  Algorithms  N and  S are  obviously  not  stable. 

12.  Set  Lo  <—  1,  t <—  N + 1;  then  for  p = 1,  2,  . . . , N — 1,  do  the  following: 

If  Kp  < Kp+ 1 set  Lp  p + 1;  otherwise  set  Lt  •< (p  + 1 ),  t 4—  p. 

Finally,  set  Lt  <—  0,  Ln  <—  0,  Ln+\  <-  |Ljv+i|. 

(Stability  is  preserved.  The  number  of  passes  is  [lg  r] , where  r is  the  number  of 
ascending  runs  in  the  input;  the  exact  distribution  of  r is  analyzed  in  Section  5.1.3. 
We  may  conclude  that  natural  merging  is  preferable  to  straight  merging  when  linked 
allocation  is  being  used,  although  it  was  inferior  for  sequential  allocation.) 

13.  The  running  time  for  N > 3 is  (11 A + 6 B + 3 B'  + 9C  -F  2 C"  + 4 D + 5 N + 9 )u, 
where  A is  the  number  of  passes;  B = B'  + B"  is  the  number  of  subfile-merge  operations 
performed,  where  B'  is  the  number  of  such  merges  in  which  the  p subfile  was  exhausted 
first;  C = C'  + C"  is  the  number  of  comparisons  performed,  where  C'  is  the  number  of 
such  comparisons  with  Kp  < Kq;  D = D'  + D"  is  the  number  of  elements  remaining 
in  subfiles  when  the  other  subfile  has  been  exhausted,  where  D'  is  the  number  of  such 
elements  belonging  to  the  q subfile.  In  Table  3 we  have  A — 4,  B'  = 6,  B"  = 9 , C'  = 22, 
C"  = 22,  D'  = 10,  D"  — 10,  total  time  = 761u.  (The  comparable  Program  5.2. 1L 
takes  only  433m,  when  improved  as  in  exercise  5.2.1-33,  so  we  can  see  that  merging 
isn’t  especially  efficient  when  N is  small.) 

Algorithm  L does  a sequence  of  merges  on  subfiles  whose  sizes  ( m,n ) can  be 
determined  as  follows:  Let  N — 1 = (6*...  6160)2  in  binary  notation.  There  are 
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(bk  ■ ■ ■ bj+i)2  “ordinary”  merges  with  ( m,n ) = (23, 2J),  for  0 < j < k:  and  there  are 
“special”  merges  with  (m,  n)  = (2J , 1 + (bj-i  . . . 60)2)  whenever  bj  = 1,  for  0 < j < k. 
For  example,  when  N = 14  there  are  six  ordinary  (1, 1)  merges,  three  ordinary  (2,2) 
merges,  one  ordinary  (4,4)  merge,  and  the  special  merges  deal  with  subfiles  of  sizes 
(1,1),  (4,2),  (8,6).  The  multiset  Mjv  of  merge  sizes  ( m,n ) can  also  be  described  by 
the  recurrence  relations 

Mi  = 0;  M2k+r  = {(2k, r)}  ttl  M2k  ttl  Mr  for  0 < r < 2k. 

It  follows  that,  regardless  of  the  input  distribution,  we  have  A = [lg  IV] , B = IV— 1, 

C'  + D"  = £k=0  bj 2J  ( 1 + |j),  C"  + D'  = £k=0  + 2j(^j  + bj+ 1 b bk));  hence 

only  B\  C\  D'  need  to  be  analyzed  further. 

If  the  input  to  Algorithm  L is  random,  each  of  the  merging  operations  satisfies 
the  conditions  of  exercise  2,  and  is  independent  of  the  behavior  of  the  other  merges; 
so  the  distribution  of  B',  C',  D'  is  the  convolution  of  their  individual  distributions 
for  each  subfile  merge.  The  average  values  for  such  a merge  are  B'  = n/(m  + n), 
C'  = mn/(n  + 1),  D'  = n/(m  + 1).  Sum  these  over  all  relevant  (m,  n)  to  get  the  exact 
average  values. 

When  N = 2k  we  have,  of  course,  the  simplest  situation;  Bave  = \B,  Cave  — |CaVe, 
C + D = kN,  and  Dave  = £*=1(2k“j2V(25'-1  + 1))  = a' N + 0(1),  where 

f 1 1 1 
a =2^£TfTI=a+2_2^4;r^T 

n>0  n> 1 

= 1.26449  97803  48444  20919  13197  47255  49848  25577- 

can  be  evaluated  to  high  precision  as  in  exercise  5.2.3-27.  This  special  case  was  first 
analyzed  by  A.  Gleason  [unpublished,  1956]  and  H.  Nagler  [CACM  3 (1960),  618-620]. 

14.  Set  D = B in  exercise  13  to  maximize  C.  [A  detailed  analysis  of  Algorithm  L has 
been  carried  out  by  W.  Panny  and  H.  Prodinger,  Algorithmica  14  (1995),  340-354.] 

15.  Make  extra  copies  of  steps  L3,  L4,  L6  for  the  cases  that  Ls  is  known  to  equal  p or  q. 
[A  further  improvement  can  also  be  made,  removing  the  assignment  s <—  p (or  s q) 
from  the  inner  loop,  by  simply  renaming  the  registers!  For  example,  change  lines  20 
and  21  to  ‘LD3  INPUT , 1 (L)  ’ and  continue  with  p in  rI3,  s in  rll  and  La  known  to  equal  p. 
With  eighteen  copies  of  the  inner  loop,  corresponding  to  the  different  permutations  of 
(p,  q , s)  with  respect  to  (rll,  rI2,  rI3),  and  to  different  knowledge  about  Ls,  we  can  cut 
the  average  running  time  to  (8NlgN  + 0(N))u.] 

16.  (The  result  will  be  slightly  faster  than  Algorithm  L;  see  exercise  5.2.3-28.) 

17.  Consider  the  new  record  as  a subfile  of  length  1.  Repeatedly  merge  the  smallest 
two  subfiles  if  they  have  the  same  length.  (The  resulting  sorting  algorithm  is  essentially 
the  same  as  Algorithm  L,  but  the  subfiles  are  merged  at  different  relative  times.) 

18.  Yes,  but  it  seems  to  be  a complicated  job.  The  first  solution  to  be  found  used 
the  following  ingenious  construction  [Doklady  Akad.  Nauk  SSSR  186  (1969),  1256- 
1258]:  Let  nbe«  %/iV.  Divide  the  file  into  m + 2 “zones”  Z\  . . . Zm  Zm+ 1 Zm+ 2,  where 
Zm+ 2 contains  N mod  n records  while  each  other  zone  contains  exactly  n records. 
Interchange  the  records  of  Zm+ 1 with  the  zone  containing  Rm  ; the  file  now  takes  the 
form  Z\  . . . Zm  A,  where  each  of  the  Z\  . . . Zm  contains  exactly  n records  in  order  and 
where  A is  an  auxiliary  area  containing  s records,  for  some  s in  the  range  n < s < 2n. 

Find  the  zone  with  smallest  leading  element,  and  exchange  that  entire  zone  with  Z\ ; 
if  more  than  one  zone  has  the  smallest  leading  element,  choose  one  that  has  the  smallest 
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trailing  element.  (This  takes  0(m  + n ) operations.)  Then  find  the  zone  with  the 
next  smallest  leading  and  trailing  elements,  and  exchange  it  with  Z2,  etc.  Finally  in 
0(m(m  + n))  = O(N)  operations  we  will  have  rearranged  the  m zones  so  that  their 
leading  elements  are  in  order.  Furthermore,  because  of  our  original  assumptions  about 
the  file,  each  of  the  keys  in  Z\  . . . Zrn  will  now  have  fewer  than  n inversions. 

We  can  merge  Z\  with  Z2 , using  the  following  trick:  Interchange  Z\  with  the 
first  n elements  A'  of  A;  then  merge  Z2  with  A'  in  the  usual  way  but  exchanging 
elements  with  the  elements  of  Z1Z2  as  they  are  output.  For  example,  if  n = 3 and 
Xi  < J/i  < X2  < 2/2  < X3  < y3,  we  have 


Zone  1 

Zone  2 

Auxiliary 

Initial  contents: 

Xl 

x2 

X3 

1/1 

2/2 

2/3 

Ol 

a2 

a3 

Exchange  Z\: 

ai 

02 

03 

2/i 

2/2 

2/3 

Xl 

X2 

X3 

Exchange  *1: 

Xl 

02 

03 

2/i 

2/2 

2/3 

Ol 

X2 

X3 

Exchange  y\\ 

Xl 

yi 

o3 

02 

2/2 

2/3 

Ol 

X2 

X3 

Exchange  X2: 

Xl 

yi 

x2 

a2 

2/2 

2/3 

Ol 

03 

X3 

Exchange  y2'- 

Xl 

yi 

x2 

2/2 

a2 

2/3 

Ol 

03 

X3 

Exchange  X3: 

Xl 

y\ 

x2 

2/2 

X3 

2/3 

Ol 

03 

02 

(The  merge  is  always  complete  when  the  nth  element  of  the  auxiliary  area  has  been 
exchanged;  this  method  generally  permutes  the  auxiliary  records.) 

The  trick  above  is  used  to  merge  Z\  with  Z2,  then  Z2  with  Z3,  . . . , Zm- 1 with  Zrn , 
requiring  a total  of  0(mn)  = O(N)  operations.  Since  no  element  has  more  than 
n inversions,  the  Z\  . . . Zm  portion  of  the  file  has  been  completely  sorted. 

For  the  final  “cleanup,”  we  sort  Rn+i-2s  ■ • Rn  by  insertion,  in  0(s2)  = O(N) 
steps;  this  brings  the  s largest  elements  into  area  A.  Then  we  merge  Ri . . . Rn-2s 
with  Rn+i-2s  ■ ■ ■ Rn-s,  using  the  trick  above  with  auxiliary  storage  area  A (but 
interchanging  the  roles  of  right  and  left,  less  and  greater,  throughout).  Finally,  we 
sort  Jijv+i-s  • • • Rn  by  insertion. 

Subsequent  refinements  are  discussed  by  J.  Katajainen,  T.  Pasanen,  and  J.  Teuhola 
in  Nordic  J.  Computing  3 (1996),  27-40.  See  answer  5.5-3  for  the  problem  of  stable 
merging  in  place. 

19.  We  may  number  the  input  cars  so  that  the  final  permutation  has  them  in  order, 
12  ...  2n;  so  this  is  essentially  a sorting  problem.  First  move  the  first  2n_  cars 
through  n — 1 stacks,  putting  them  in  decreasing  order,  and  transfer  them  to  the  nth 
stack  so  that  the  smallest  is  on  top.  Then  move  the  other  2n_1  cars  through  n — 1 
stacks,  putting  them  into  increasing  order  and  leaving  them  positioned  just  before  the 
nth  stack.  Finally,  merge  the  two  sequences  together  in  the  obvious  way. 

20.  For  further  information,  see  R.  E.  Tarjan,  JACM  19  (1972),  341-346. 

22.  See  Information  Processing  Letters  2 (1973),  127-128. 

23.  The  merges  can  be  represented  by  a binary  tree  that  has  all  external  nodes  on  levels 
[lglVJ  and  fig  AH-  Therefore  the  maximum  number  of  comparisons  is  the  minimum 
external  path  length  of  a binary  tree  with  N external  nodes,  Eq.  5.3.1-(34),  minus 
N — 1,  since  /(m,  n)  = m + n — 1 gives  the  maximum  and  there  are  N — 1 merges.  (See 
also  Eq.  5.4.9-(i).) 

General  techniques  for  studying  the  asymptotic  properties  of  such  recurrences  with 
the  help  of  Mellin  transforms  have  been  presented  by  P.  Flajolet  and  M.  Golin  in  Acta 
Informatica  31  (1994),  673-696;  in  particular,  they  show  that  the  average  number  of 
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comparisons  is  N Ig  JV  — ON  + <5(lg  N)N  + 0(1)  and  the  variance  is  ~ .345JV,  where  <5  is 
a continuous  function  of  period  1 and  average  value  0,  and 

1 1 1 y'  2 2m + 1 

In 2 2 + In 2 (m  + l)(m  + 2)  2 m 

= 1.24815  20420  99653  84890  29565  64329  53240  16127+ . 

The  total  number  of  comparisons  is  well  approximated  by  a normal  distribution  as 
N — » oo;  see  the  complementary  analyses  by  H.-K.  Hwang  and  M.  Cramer  in  Random 
Structures  & Algorithms  8 (1996),  319-336;  11  (1997),  81-96. 


SECTION  5.2.5 


1.  No,  because  radix  sorting  doesn’t  work  at  all  unless  the  distribution  sorting  is 
stable,  after  the  first  pass.  (But  the  suggested  distribution  sort  could  be  used  in  a most- 
significant-digit-first  radix  sorting  method,  generalizing  radix  exchange,  as  suggested 
in  the  last  paragraph  of  the  text.) 

2.  It  is  “anti-stable,”  just  the  opposite;  elements  with  equal  keys  appear  in  reverse 
order,  since  the  first  pass  goes  through  the  records  from  Rn  to  Hi.  (This  proves  to  be 
convenient  because  of  lines  28  and  20  of  Program  R,  equating  A with  0;  but  of  course 
it  is  not  necessary  to  make  the  first  pass  go  backwards.) 

3.  If  pile  0 is  not  empty,  B0TM[0]  already  points  to  the  first  element;  if  it  is  empty, 
we  set  P +-  L0C(B0TM[0])  and  later  make  LINK(P)  point  to  the  bottom  of  the  first 
nonempty  pile. 

4.  When  there  are  an  even  number  of  passes  remaining,  take  pile  0 first  (top  to 
bottom),  followed  by  pile  1,  . . . , pile  (M  — 1);  the  result  will  be  in  order  with  respect 
to  the  digits  examined  so  far.  When  there  are  an  odd  number  of  passes  remaining, 
take  pile  (M  — 1)  first,  then  pile  (M  — 2),  . . . , pile  0;  the  result  will  be  in  reverse  order 
with  respect  to  the  digits  examined  so  far.  (This  rule  was  apparently  first  published 
by  E.  H.  Friend  [JACM  3 (1956),  156,  165-166].) 

5.  Change  line  04  to  ‘ENT3  7’,  and  change  the  R3SW  and  R5SW  tables  to: 


R3SW 


R5SW 


LD2  KEY ,1(1:1) 

LD2  KEY ,1(2: 2) 

LD2  KEY ,1(3: 3) 

LD2  KEY ,1(4: 4) 

LD2  KEY, 1(5: 5) 

LD2  INPUT, 1(1:1) 

LD2  INPUT, 1(2: 2) 

LD2  INPUT, 1(3: 3) 

LD1  INPUT, 1 (LINK) 

! (repeat  the  previous  line  six  more  times) 
DEC1  1 | 


The  new  running  time  is  found  by  changing  “3”  to  “8”  everywhere;  it  amounts  to 
(lip  — 1)JV  + 16 pM  + 12p  — 4 E + 2,  for  p — 8. 

6.  (a)  Consider  placing  an  (JV  + l)st  element.  The  recurrence 


k + 1 M — k 

PMN(k+l ) H PMNk 


PM(N+l)k  — 
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is  equivalent  to  the  stated  formula,  (b)  The  nth  derivative  satisfies  3m(jv+i.)  (z)  = 
(1  — n/M)g^N(z)  + ((1  — z)/M)g^f1\z),  by  induction  on  n.  Setting  z = 1,  we  find 
= (1  _ n/M)N M- , since  smo(z)  = zM . Hence  mean(gMN)  = (1  - 1 /M)NM, 
var (gMN)  = (1  - 2/M)NM{M  - 1)  + (1  - 1/M)NM  - (1  - 1 /M)2NM2.  (Notice  that 
the  generating  function  for  E in  Program  R is  gMN(z)p.) 

7.  Let  R = radix  sort,  RX  = radix  exchange.  Some  of  the  important  similarities  and 
differences:  RX  goes  from  most  significant  digit  to  least  significant,  while  R goes  the 
other  way.  Both  methods  sort  by  digit  inspections,  without  making  comparisons  of  keys. 
RX  always  has  M — 2 (but  see  exercise  1).  The  running  time  for  R is  almost  unvarying, 
while  RX  is  sensitive  to  the  distribution  of  the  digits.  In  both  cases  the  running  time 
is  0(N  log  K),  where  K is  the  range  of  keys,  but  the  constant  of  proportionality  is 
higher  for  RX;  on  the  other  hand,  when  the  keys  are  uniformly  distributed  in  their 
leading  digits,  RX  has  an  average  running  time  of  0(N  log  N)  regardless  of  the  size 
of  K.  R requires  link  fields  while  RX  runs  in  minimal  space.  The  inner  loop  of  R is 
more  suited  to  pipeline  computers. 

8.  On  the  final  pass,  the  piles  should  be  hooked  together  in  another  order;  for 
example,  if  M = 256,  pile  (10000000)2  comes  first,  then  pile  (10000001)2,  . . . , pile 
(lllimi)2,  pile  (00000000)2,  pile  (00000001)2,  ...,  pile  (01111111)2.  This  change 
in  hooking  order  can  be  done  easily  by  modifying  Algorithm  H,  or  (in  Table  1)  by 
changing  the  storage  allocation  strategy,  on  the  last  pass. 

9.  We  could  first  separate  the  negative  keys  from  the  positive  keys,  as  in  exercise 

5.2.2- 33;  or  we  could  change  the  keys  to  complement  notation  on  the  first  pass. 
Alternatively,  after  the  last  pass  we  could  separate  the  positive  keys  from  the  negative 
ones,  reversing  the  order  of  the  latter,  although  the  method  of  exercise  5.2.2-33  no 
longer  applies. 

11.  Without  the  first  pass  the  method  would  still  sort  perfectly,  because  (by  coinci- 
dence) 503  already  precedes  509.  Without  the  first  two  passes,  the  number  of  inversions 
would  bel  + l + 0 + 0 + 0 + l + l + l + 0-|-0  = 5. 

12.  After  exchanging  R*  with  R [P]  in  step  M4  (exercise  5.2-12),  we  can  compare  Kk 
to  Kk- 1-  If  Kk  is  less,  we  compare  it  to  Kk- 2,  Kk- 3,  . . . , until  finding  Kk  > Kj.  Then 
set  (Rj+ 1, . . . , Rk-i,Rk)  <r-  (Rk,Rj+ 1, . . . , Rk- 1),  without  changing  the  LINK  fields.  It 
is  convenient  to  place  an  artificial  key  K0,  which  is  < all  other  keys,  at  the  left  of 
the  file. 

14.  If  the  original  permutation  of  the  cards  requires  k readings,  in  the  sense  of  exercise 

5.1.3- 20,  and  if  we  use  m piles  per  pass,  we  must  make  at  least  [ logm  fc]  passes. 
(Consider  going  back  from  a sorted  deck  to  the  original  one;  the  number  of  readings 
increases  by  at  most  a factor  of  m on  each  pass.)  The  given  permutation  requires  4 
increasing  readings,  10  decreasing  readings;  hence  decreasing  order  requires  4 passes 
with  two  piles  or  3 passes  with  three  piles. 

Conversely,  this  optimum  number  of  passes  can  be  achieved:  Number  the  cards 
from  0 to  k — 1 according  to  which  reading  it  belongs  to,  and  use  a radix  sort  (least 
significant  digit  first  in  radix  m).  [See  Martin  Gardner’s  Sixth  Book  of  Mathematical 
Games  (San  Francisco:  W.  H.  Freeman,  1971),  111-112.] 

15.  Let  there  be  k readings  and  m piles.  The  order  is  reversed  on  each  pass;  if  there  are 
k readings  in  one  order,  the  number  of  readings  in  the  opposite  order  is  n + 1 — k.  The 
minimum  number  of  passes  is  either  the  smallest  even  number  greater  than  or  equal  to 
logm  k or  the  smallest  odd  number  greater  than  or  equal  to  logm(n  + 1 — k).  (Going 
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backwards,  there  are  at  most  m decreasing  readings  after  one  pass,  m2  increasing 
readings  after  two  passes,  etc.)  The  example  can  be  sorted  into  increasing  order  in 
min(2, 5)  = 2 passes,  into  decreasing  order  in  min(3, 4)  = 3 passes,  using  only  two  piles. 

16.  Assume  that  each  string  is  followed  by  a special  null  character  that  is  less  than 
any  letter  of  the  alphabet.  Perform  a left-to-right  radix  sort  by  starting  with  all  strings 
linked  together  in  a single  block  of  data.  Then  for  k = 1,  2,  . . . , refine  every  block  that 
contains  more  than  one  distinct  string  by  splitting  it  into  subblocks  based  on  the  fcth 
letter  of  each  string,  meanwhile  keeping  the  blocks  sorted  by  their  already-examined 
prefixes.  When  a block  has  only  one  item,  or  when  its  kth  characters  are  all  null  (so 
that  its  keys  are  identical),  we  can  arrange  to  avoid  examining  it  again.  [R.  Paige 
and  R.  E.  Tarjan,  SICOMP  16  (1987),  973-989,  §2.]  This  process  is  essentially  that 
of  constructing  a trie  as  in  Section  6.3.  A simpler  but  slightly  less  efficient  algorithm 
based  on  right-to-left  radix  sort  was  given  for  this  problem  by  Aho,  Hopcroft,  and 
Ullman,  The  Design  and  Analysis  of  Computer  Algorithms  (Addison-Wesley,  1974), 
79-84.  The  methods  of  Mcllroy,  Bostic,  and  Mcllroy,  cited  in  the  text,  are  faster  yet 
in  practice. 


17.  MacLaren’s  method  speeds  up  the  second  level,  but  it  cannot  be  used  at  the  top 
level  because  it  does  not  compute  the  numbers  Nk. 

18.  First  we  prove  the  hint:  Let  pk  = J^^CN  f(x)  dx  be  the  probability  that  a 
key  falls  into  bin  k when  there  are  CN  bins.  The  time  needed  to  distribute  the  rec- 
ords is  O(N),  and  the  average  number  of  inversions  remaining  after  distribution  is 

I Efc=o_1  £;  OpU  1 - P«)N~j  (5)  = § £fc=o_1  (M  < V £ ^PkB/C,  because 
pk  < B/CN. 

Now  consider  two  levels  of  distribution,  with  cN  top-level  bins,  and  let  bk  = 
sup{ / (x)  | k/cN  < x < (k  + 1 )/cN}.  Then  the  average  total  running  time  is  O(N) 
plus  YIIZq1  Tfc,  where  Tk  is  the  average  time  needed  by  MacLaren’s  method  to  sort 
Nk  keys  having  the  density  function  fk(x)  = f((k  + x)/cN) / cNpk.  By  the  hint,  we 
have  Tk  = EO(bkNk/cNpk),  because  fk(x)  is  bounded  by  bk/cNpk.  But  E Nk  = Npk, 
so  Tk  = 0(bk/c).  And  as  IV  -4  oo  we  have  Ylk=o1  bk  —t  N f*  f(x)dx  = N,  by  the 
definition  of  Riemann  integrability. 


Again  the  external  path  length  is  112  (optimum). 
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2.  In  the  notation  of  exercise  5.2.4-14, 


t 

L(n)  - B(n)  = £)((efc  + k-  l)2e“  - (ei  + l)2e*)  + 2ei+1  - 2et 

fc=i 

= 2ei  - 2et  - ^(ei  - ek  + 2 - k)2e k 
k= 2 

> 2ei  - (2ei~1  + • • • + 2ei_t+1  + 2et)  > 0, 


with  equality  if  and  only  if  n = 2k  — 2 3 for  some  k > j > 0.  [When  merging  is  done 
“top-down”  as  in  exercise  5.2.4-23,  the  maximum  number  of  comparisons  is  B(n).] 

3.  When  n > 0,  the  number  of  outcomes  such  that  the  smallest  key  appears  exactly  k 
times  is  (")Pn_fc.  Thus  2 P„  = '%2k  (™)Pn_fc,  for  n > 0,  and  we  have  2 P(z)  = ezP(z)  + 1 
by  Eq.  1.2.9-(io). 

Another  proof  comes  from  the  fact  that  Pn  = Ylk>o  {£}&!>  s'nce  {1}  is  the  number 
of  ways  to  partition  n elements  into  k nonempty  parts  and  these  parts  can  be  permuted 
in  k\  ways.  Thus  ]Pn>0  PnZn/n\  = £fc>o(eZ  - 1)*  = 1/(2  - ez)  by  Eq.  1.2.9-(23). 

Still  another  proof,  perhaps  the  most  interesting,  arises  if  we  arrange  the  elements 
in  sequence  in  a stable  manner,  so  that  Ki  precedes  Kj  if  and  only  if  Ki  < Kj  or 
( Ki  = Kj  and  i < j).  Among  all  Pn  outcomes,  a given  arrangement  Kai  ■ ■ ■ Kan  now 
occurs  exactly  2k  times  if  the  permutation  a\  . . ,an  contains  k ascents;  hence  Pn  can 
be  expressed  in  terms  of  the  Eulerian  numbers,  Pn  — (k)2k-  Eq.  5.1.3-(2o)  with 

z = 2 now  establishes  the  desired  result. 

This  generating  function  was  obtained  by  A.  Cayley  [Phil.  Mag.  (4)  18  (1859), 
374-378]  in  connection  with  the  enumeration  of  an  imprecisely  defined  class  of  trees. 
See  also  P.  A.  MacMahon,  Proc.  London  Math.  Soc.  22  (1891),  341-344;  J.  Touchard, 
Ann.  Soc.  Sci.  Bruxelles  53  (1933),  21-31;  and  O.  A.  Gross,  AMM  69  (1962),  4-8, 
who  gave  the  interesting  formula  P„  = ^2k>1  kn/21+k,  n > 1. 

4.  The  representation 


2 P(z) 


K 


1 — i cot 


i(z  — In  2) 


) = - — - — y( — 

J 2 z — In  2 ■“  V 2 — In  2 — 


- + 


2nik  z — In  2 + 2irik 


ik ) 


yields  the  convergent  series  Pn/n!  = 1 (In 2)  n 1 + J2k>i  3®((ln 2 + 2nik)  ” J). 


6.  S'(n)  > S(n),  since  the  keys  might  all  be  distinct;  thus  we  must  show  that  S'(n)  < 
S(n).  Given  a sorting  algorithm  that  takes  S(n)  steps  on  distinct  keys,  we  can  construct 
a sorting  algorithm  for  the  general  case  by  defining  the  = branch  to  be  identical  to  the 


Fig.  A— 2.  Solution  to  exercise  8. 
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< branch,  removing  redundancies.  When  an  external  node  appears,  we  know  all  of  the 
equality  relations,  since  we  have  Kai  < Ka.2  < ■ ■ ■ < Kan  and  an  explicit  comparison 
Kai  '-Kai+ 1 has  been  made  for  1 < i < n. 

M.  Paterson  observes  that  if  the  multiplicities  of  keys  are  (n i, . . . , nm),  the  number 
of  comparisons  can  be  reduced  to  nlgn  — Yhni  nj  + 0(n);  see  SICOMP  5 (1976),  2. 
This  lower  bound  can  almost  be  reached  without  substantial  auxiliary  memory  by 
adapting  heapsort  to  equal  keys  as  suggested  by  Munro  and  Raman  in  Lecture  Notes 
in  Comp.  Sci.  519  (1991),  473-480. 

7.  See  Fig.  A-l.  The  average  number  of  comparisons  is  (2  + 3 + 3 + 2 + 3 + 3 + 3 + 
2 • 3 + 3 + 3 + 3 + 2 + 3 + 3 + 2)/16  = 2 f . 

8.  See  Fig.  A-2.  The  average  number  of  comparisons  is  3§y. 

9.  We  need  at  least  n — 1 comparisons  to  discover  that  all  keys  are  equal,  if  they 
are.  Conversely,  n — 1 comparisons  always  suffice,  since  we  can  always  deduce  the  final 
ordering  after  comparing  K\  with  all  of  the  other  keys. 

10.  Let  /(n)  be  the  desired  function,  and  let  g(n)  be  the  minimum  average  number  of 
comparisons  needed  to  sort  n+k  elements  when  k > 0 and  exactly  k of  the  elements  have 
known  values  (0  or  1).  Then  /( 0)  = /( 1)  = g( 0)  = 0,  g(  1)  = 1;  f(n ) = 1 + |/(n  — 1)  + 
\g(n-  2),  g(n)  = 1 + min(g(n  - 1),  \g(n  - 1)  + \g{n  - 2))  = 1 + \g(n  - 1)  + \g(n  - 2), 
for  n > 2.  (Thus  the  best  strategy  is  to  compare  two  unknown  elements  whenever 
possible.)  It  follows  that  f(n)  — g(n)  = |(/(n  — 1)  — g(n  — 1))  for  n > 2,  and  g(n)  = 
| (n  + |(l  — (—  §)"))  for  n > 0.  Hence  the  answer  is 

§"+ §-§(-£)" -(§)n"\  for  n > 1. 

(This  exact  formula  may  be  compared  with  the  information-theoretic  lower  bound, 
log3 (2n  - 1)  « 0.6309n.) 

11.  Binary  insertion  proves  that  Sm(n)  < B(m)  + (n  — m)[lg(m  + 1)],  for  n > m. 
On  the  other  hand  Sm(n)  > f Ig i and  this  is  Eisymptotically  nlgm  + 
0(((m  - 1 )/m)n)\  see  Eq.  1.2.6-(53). 

12.  (a)  If  there  are  no  redundant  comparisons,  we  can  arbitrarily  assign  an  order  to 
keys  that  are  actually  equal,  when  they  are  first  compared,  since  no  order  can  be 
deduced  from  previously  made  comparisons,  (b)  Assume  that  the  tree  strongly  sorts 
every  sequence  of  zeros  and  ones;  we  shall  prove  that  it  strongly  sorts  every  permutation 
of  {1,2,  ...,n).  Suppose  it  doesn’t;  then  there  is  a permutation  for  which  it  claims 
that  Kai  < Ka2  < •••  < Kan,  whereas  in  fact  Ka%  > Kai+1  for  some  i.  Replace  all 
elements  < Kai  by  0 and  all  elements  > Kai  by  1;  by  assumption  the  method  will  now 
sort  when  we  take  the  path  that  leads  to  Kai  < Ka 2 <■■■  < Kan , a contradiction. 

13.  If  n is  even,  F(n)  — F(n  — 1)  = 1 + Jr'(Lri/2j)  —F(\n/ 2J  — l)  so  we  must  prove  that 
Wk- 1 < Ln/2J  < Wk',  this  is  obvious  since  Wk-i  = \wk/ 2J.  If  n is  odd,  F(n)—F(n  — 1)  = 
Gtfn/2l)-G(Ln/2j),  so  we  must  prove  that  tk-i  < [n/2]  < this  is  obvious  since 
tk- 1 = \wk/ 2] . 

14.  By  exercise  1.2.4-42,  the  sum  is  n[lg|n"|  — (»i  + • • • + Wj)  where  Wj  < n < 
Wj+i.  The  latter  sum  is  Wj+ 1 — [j / 2J  — 1.  We  can  therefore  express  F(n)  in  the  form 
n(lg  | n]  — [2Llg(6n>J/3j  -|-  [1  lg(6n)J  (and  in  many  other  ways). 

15.  If  [lg  |n]  = lg(fn)  -(-  9,  F(n)  = nlgn  - (3  - lg3)n  + n(9  + 1 - 2e)  + O(logn). 
If  [lgn]  = lgn  + 8,  B(n)  = nlgn  — n + n(9  + 1 — 2°)  + O(logn).  [Note  that  lgn!  = 
nlgn  — n/(ln2)  + O(logn);  l/(ln2)  « 1.443;  3 — lg3  « 1.415.] 


656  ANSWERS  TO  EXERCISES 


5.3.1 


17.  The  number  of  cases  with  bk  < ap  < bk+ i is 

/ m — p + n — k\  fp  — 1 + fc\ 

V m — p J \ p — 1 / ’ 

and  the  number  of  cases  with  a.j  < bq  < aJ+i  is 

^ n-q  + m-j j (q  - l+j'j 

18.  No,  since  we  are  considering  only  the  less  efficient  branch  of  the  tree  below  each 
comparison.  One  of  the  more  efficient  branches  might  turn  out  to  be  harder  to  handle. 

20.  Let  L be  the  maximum  level  on  which  an  external  node  appears,  and  let  l be 
the  minimum  such  level.  If  L > l + 2,  we  can  remove  two  nodes  from  level  L and 
place  them  below  a node  at  level  Z;  this  decreases  the  external  path  length  by  Z + 2L  — 
(L  — 1 + 2(Z  + 1))  = L — l — 1 > 1.  Conversely,  if  L < l + 1,  let  there  be  k external 
nodes  on  level  Z and  IV  — k on  level  Z + 1,  where  0 < k < N.  By  exercise  2. 3. 4. 5-3, 
k2~l  + (N  — k) 2~l_1  = 1;  hence  N + k = 2i+1.  The  inequalities  2 1 < N < 2l+1  now 
show  that  l = [lgIVJ;  this  defines  k and  yields  the  external  path  length  (34). 

21.  Let  r(x)  be  the  root  of  x’s  right  subtree.  All  subtrees  have  minimum  height  if  and 
only  if  |"lgf(Z(z))]  < |"lgf(x)]  - 1 and  |"lgt(r(x))]  < [lgt(x)]  - 1 for  all  x.  The  first 
condition  is  equivalent  to  2 t(l(x))  — t(x)  < 2ri«*(I)l  — t(x),  and  the  second  condition  is 
equivalent  to  t(x)  — 2 t(l(x))  < 2ris‘Ml  — t(x). 

22.  By  exercise  20,  the  four  conditions  [lgt(Z(x))J,  [lgt(r(x))J  > [lg t(x)\  — 1 and 
|"lgt(Z(x))],  [ lg t(r(x))]  < [lgt(x)]  — 1 are  necessary  and  sufficient.  Arguing  as  in 
exercise  21,  we  can  prove  them  equivalent  to  the  stated  conditions.  [Martin  Sandelius, 
AMM  68  (1961),  133-134.]  See  exercise  33  for  a generalization. 

23.  Multiple  list  insertion  assumes  that  the  keys  are  uniformly  distributed  in  a known 
range,  so  it  isn’t  a “pure  comparison”  method  satisfying  the  restrictions  considered  in 
this  section. 

24.  First  proceed  as  if  sorting  five  elements,  until  after  five  comparisons  we  reach  one  of 
the  configurations  in  (6).  In  the  first  three  cases,  complete  sorting  the  five  elements  in 
two  more  comparisons,  then  insert  the  sixth  element  /.  In  the  other  case,  first  compare 
/ : b,  insert  / into  the  main  chain,  then  insert  c.  [Picard,  Theorie  des  Questionnaires, 
page  116.] 

25.  Since  N = 7!  = 5040  and  q = 13,  there  would  be  8192  — 5040  = 3152  external 
nodes  on  level  12  and  5040  — 3152  = 1888  on  level  13. 

26.  L.  Kollar  [ Lecture  Notes  in  Comp.  Sci.  233  (1986),  449-457]  has  presented  an 
excellent  way  to  verify  that  the  optimum  method  has  an  external  path  length  of  62416. 


27. 
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is  the  only  way  to  recognize  the  two  most  frequent  permutations  with  two  comparisons, 
even  though  the  first  comparison  produces  a .27/. 73  split! 

28.  Lun  Kwan  has  constructed  an  873-line  program  whose  average  running  time  is 
38.925u.  Its  maximum  running  time  is  43m;  the  latter  appears  to  be  optimal  since  it  is 
the  time  for  7 compares,  7 tests,  6 loads,  5 stores. 

29.  We  must  make  at  least  S(n)  comparisons,  because  it  is  impossible  to  know  whether 
a permutation  is  even  or  odd  unless  we  have  made  enough  comparisons  to  determine 
it  uniquely.  For  we  can  assume  that  enough  comparisons  have  been  made  to  narrow 
things  down  to  two  possibilities  that  depend  on  whether  or  not  a*  is  less  than  a3,  for 
some  i and  j;  one  of  the  two  possibilities  is  even,  the  other  is  odd.  [On  the  other  hand 
there  is  an  O(n)  algorithm  for  this  problem,  which  simply  counts  the  number  of  cycles 
and  uses  no  comparisons  at  all;  see  exercise  5. 2. 2-2.] 

30.  Start  with  an  optimal  comparison  tree  of  height  S(n);  repeatedly  interchange  i j 
in  the  right  subtree  of  a node  labeled  i:j,  from  top  to  bottom.  Interpreting  the  result 
as  a comparison-exchange  tree,  every  terminal  node  defines  a unique  permutation  that 
can  be  sorted  by  at  most  n — 1 more  comparison-exchanges  (by  exercise  5. 2. 2-2). 

[The  idea  of  a comparison-exchange  tree  is  due  to  T.  N.  Hibbard.] 

31.  At  least  8 are  required,  since  every  tree  of  height  7 will  produce  the  configuration 


(or  its  dual)  in  some  branch  after  4 steps,  with  a / 1.  This  configuration  cannot  be 
sorted  in  3 more  comparison/exchange  operations.  On  the  other  hand  the  following 
tree  achieves  the  desired  bound  (and  perhaps  also  the  minimum  average  number  of 
comparison/exchanges) : 


33.  Simple  operations  applied  to  any  tree  of  order  x and  resolution  1 can  be  applied  to 
yield  another  whose  weighted  path  length  is  no  greater,  where  all  external  nodes  lie  on 
levels  k and  k — 1 for  some  k,  and  at  most  one  external  node  is  noninteger.  Furthermore, 
the  noninteger  external  node  lies  on  level  k,  if  such  a node  is  present.  The  weighted 
path  length  of  any  such  tree  has  the  stated  value,  so  this  must  be  minimal.  Conversely, 
if  (iv)  and  (v)  hold  in  any  real-valued  search  tree  it  is  possible  to  show  by  induction 
that  the  weighted  path  length  has  the  stated  value,  since  there  is  a simple  formula  for 
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the  weighted  path  length  of  a tree  in  terms  of  the  weighted  path  lengths  of  the  two 
subtrees  of  the  root. 

36.  [Mat.  Zametki  4 (1968),  511-518.]  See  S.  Felsner  and  W.  T.  Trotter,  Combina- 
torics, Paul  Erdos  is  Eighty  1 (1993),  145-157,  for  a summary  of  progress  on  this 
problem,  and  for  a proof  that  we  can  always  achieve 

1 < T(G1)/T(G2)  < p, 

where  the  constant  p is  slightly  less  than  8/3. 


SECTION  5.3.2 

1.  S(m  + n)  < S(m)  + S(n)  + M(m,n). 

2.  The  internal  node  that  is  fcth  in  symmetric  order  corresponds  to  the  comparison 
Ai:Bk- 

3.  Strategy  B(l,  l)  is  no  better  than  strategy  A(l,  1+1),  and  strategy  B' (1,1)  no  better 
than  A'(l,  l— 1);  hence  we  must  solve  the  recurrence 

,M.(l,n)=  min  max(  max  (l+.M. (1,  l— 1)),  max  (1+.M.(1,  n— /))),  n > 1; 

l<j<n  l<i<j 

.M.(l,  0)  = 0. 

It  is  not  difficult  to  verify  that  [lg(n  + 1)]  satisfies  this  recurrence. 

4.  No.  [C.  Christen,  FOCS  19  (1978),  259-266.] 

6.  Strategy  A'(i,  i+1)  can  be  used  when  j = i + 1,  except  when  i < 2.  And  we  can 
use  strategy  A (i,  i+ 2)  when  j > i + 2. 

7.  To  insert  k + m elements  among  n others,  independently  insert  k elements  and 
m elements.  (When  k and  m are  large,  an  improved  procedure  is  possible;  see  exer- 
cise 19.) 

8,9.  In  the  following  diagrams,  i:j  denotes  the  comparison  At : Bj , MtJ  denotes 
merging  i elements  with  j in  M(i,j)  steps,  and  A denotes  sorting  the  pattern  ,Y, 
or  A!.  in  three  steps. 
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11.  Let  n = gt  as  in  the  hint.  We  may  assume  that  t > 6.  Without  loss  of  generality 
let  A2:Bj  be  the  first  comparison.  If  j > the  outcome  A2  < Bj  will  require  > t 
more  steps.  If  j < gt- 1,  the  outcome  A 2 < Bj  would  be  no  problem,  so  only  the  case 
A2  > Bj  needs  study,  and  we  get  the  most  information  when  j = gt-i.  If  t = 2k  + 1, 
we  might  have  to  merge  A2  with  the  gt-gt-i  = 2fc_1  elements  > Bgt_1,  and  merge  At 
with  the  gt-i  others,  but  this  requires  k + (k  + 1)  = t further  steps.  On  the  other  hand 
if  n = gt  — 1,  we  could  merge  .4 2 with  2k  1 — 1 elements,  then  A\  with  n elements,  in 
(k  — 1)  + (k  + 1)  further  steps,  hence  M( 2,  gt  — 1)  < t. 

The  case  t = 2k  is  considerably  more  difficult;  note  that  gt  — gt- 1 > 2k~2.  After 
Ai  > Bgt_l,  suppose  we  compare  Af.Bj.  If  j > 2k~1  the  outcome  A,  < Bj  requires 
k + (k  - 1)  further  comparisons  (too  many).  If  j < 2fe_1,  we  can  argue  as  before  that 
j = 2fc  1 gives  most  information.  After  Ai  > B2k- 1,  the  next  comparisons  with  4i 
might  as  well  be  with  B2k-i+2k-2,  then  B2k-i+2k-2+2k-3 ; since  2k~ 1 +2k~2  + 2k~3  > 
gt- 1,  the  remaining  problem  is  to  merge  {A\,A2}  with  n - (2fc_1  + 2k~2  + 2k~3) 
elements.  Of  course  we  needn’t  make  any  comparisons  with  Ai  right  away;  we  could 
instead  compare  A2:Bn+i-j.  If  j < 2k~3,  we  consider  the  case  .4 2 < Bn+ i-j,  while  if 
j > 2k  3 we  consider  A2  > Bn+1-j.  The  latter  case  requires  at  least  (k  — 2)  + (k  + 1) 
more  steps.  Continuing,  we  find  that  the  only  potentially  fruitful  line  is  A2  > Bgt_1, 
A 2 < Bn+1_2k-3 , A\  > B2k-i,  Ai  > B2k-i+2k-2,  Ai  > B2k-i+2k-2+2k-3 , but  then 
we  have  exactly  gt- 5 elements  left!  Conversely,  if  n = gt  — 1,  this  line  works.  [Acta 
Informatica  1 (1971),  145-158.] 

12.  The  first  comparison  must  be  either  a:Xk  for  1 < k < i,  or  (symmetrically) 
3 : Xn-k  for  1 < k < j.  In  the  former  case  the  response  a < Xk  leaves  us  with 
Rn(k-l,j)  more  comparisons  to  make;  the  response  a > Xk  leaves  us  with  the  problem 
of  sorting  a < /3,  Yi  < • • • < Yn-k,  a < Y%-k+1,  3 > Yn-k-j,  where  Yr  = Xr-k. 

13.  [Computers  in  Number  Theory  (New  York:  Academic  Press,  1971),  263-269.] 

14.  [SICOMP  9 (1980),  298-320.  The  complete  solution  for  M(4, n)  was  obtained 
shortly  afterwards  by  J.  Schulte  Monting,  who  also  gave  a conjectured  solution  for 
M(5,n),  in  Theor.  Comp.  Sci.  14  (1981),  19-37.] 
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15.  Double  m until  it  exceeds  n.  This  involves  |_lg(n/m)J  + 1 doublings. 

16.  All  except  ( m,n ) = (2,8),  (3,6),  (3,8),  (3,10),  (4,8),  (4,10),  (5,9),  (5,10),  when 
it’s  one  over. 

17.  Assume  that  m < n and  let  t = lg (n/m)  — 6.  Then  lg  (m+n)  > lgnm  — lgm!  > 
ro  lg  n - (m  lg  ra  - m + 1)  = m(t  + 6)  + m - 1 = H(m,  n)  + 6m  - [2em\  > H (m,  n)  + 
6m  — 2 em  > H (rri,  n)  — m.  (The  inequality  m!  < mrn  21_m  is  a consequence  of  the  fact 
that  k(m  — k)  < (m/2)2  for  1 < k < m.) 

19.  First  merge  {Ai,...,Am}  with  {B2,B4, ....  B2Ln/2J  }•  Then  we  must  insert  the 

odd  elements  B2i-i  among  a,  of  the  A’s  for  1 < i < \n/ 2],  where  ai+a2H \-a\n/2]  < 

m.  The  latter  operation  requires  at  most  a,  operations  for  each  i,  so  at  most  m more 
comparisons  will  finish  the  job. 

20.  Apply  (12). 

22.  R.  Michael  Tanner  [SICOMP  7 (1978),  18-38]  has  shown  that  a “fractile  inser- 
tion” algorithm  makes  at  most  1.061g(m+n)  comparisons  on  the  average.  L.  Kollar 
[Computers  and  Artificial  Int.  5 (1986),  335-344]  has  studied  the  average  behavior  of 
Algorithm  H. 

23.  The  adversary  keeps  annxn  matrix  X whose  entries  x%J  are  initially  all  1.  When 
the  algorithm  asks  if  Ai  — Bj,  the  adversary  sets  xtJ  to  0.  The  answer  is  “No,”  unless 
the  permanent  of  X has  just  become  zero.  In  the  latter  case,  the  adversary  answers 
“Yes”  (as  it  must,  lest  the  algorithm  terminate  immediately!),  and  deletes  row  i and 
column  j from  X\  the  resulting  (n  — 1)  x (n  — 1)  matrix  will  have  a nonzero  permanent. 
The  adversary  continues  in  this  way  until  only  a 0 x 0 matrix  is  left. 

If  the  permanent  is  about  to  become  zero,  we  can  rearrange  rows  and  columns  so 
that  i = j = 1 and  the  matrix  has  all  Is  on  the  diagonal,  yet  its  permanent  vanishes 
when  Xu  0;  then  we  must  have  xikXki  = 0 for  all  k > 1.  It  follows  that  at  least 
n zeros  are  deleted  when  the  adversary  first  answers  “Yes,”  and  n — 1 the  second  time, 
etc.  The  algorithm  will  terminate  only  after  receiving  n “Yes”  answers  to  nonredundant 

questions,  and  after  asking  at  least  n + (n  - 1)  H hi  questions  [ JACM  19  (1972), 

649-659].  A similar  argument  shows  that  n + (n  — 1)  H h(n  — m+1)  questions  are 

needed  to  determine  that  A C B when  |A|  = ra  < n = |B|. 

24.  The  coarse  preliminary  merge  needs  at  most  m + q — 1 comparisons,  and  the 
subsequent  insertions  need  at  most  t each.  These  upper  bounds  cannot  be  decreased. 
So  the  maximum  is  the  same  as  for  Algorithm  H (see  (1  9))- 

25.  The  general  problem  is  as  hard  as  the  special  case  where  each  Xij  is  0 or  1 and 
*=  5-  Then  each  comparison  is  equivalent  to  looking  at  the  bit  Xij,  and  we  want  to 
determine  the  entire  matrix  by  inspecting  the  fewest  bits.  Any  merging  problem  (1) 
corresponds  to  such  a 0-1  matrix  if  we  set  Xij  = [Ai  > Bn+i-j]-  (N.  Linial  and  M.  Saks, 
in  J.  Algorithms  6 (1985),  86-103,  attribute  this  observation  to  J.  Shearer.  A similar 
result  connects  searching  and  sorting  with  respect  to  any  partial  order.) 

SECTION  5.3.3 

1.  Player  11  lost  to  05 ; so  13  was  known  to  be  worse  than  05,  11,  and  12. 

2.  Let  x be  the  <th  largest,  and  let  S be  the  set  of  all  elements  y such  that  the 
comparisons  made  are  insufficient  to  prove  either  that  x < y or  y < x.  There  are 
permutations,  consistent  with  all  the  comparisons  made,  in  which  all  elements  of 
S are  less  than  x;  for  we  can  stipulate  that  all  elements  of  S are  less  than  x and 
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embed  the  resulting  partial  ordering  in  a linear  ordering.  Similarly  there  are  consistent 
permutations  in  which  all  elements  of  S are  greater  than  x.  Hence  we  don’t  know  the 
rank  of  x unless  S is  empty. 

3.  An  adversary  may  regard  the  loser  of  the  first  comparison  as  the  worst  player 
of  all. 

4.  Suppose  the  largest  t — 1 elements  are  {ai , . . . , at-\ }.  Any  path  in  the  comparison 
tree  to  determine  the  largest  t elements,  consistent  with  this  assumption,  must  include 
at  least  n — t comparisons  to  determine  the  largest  of  the  remaining  n — t + 1 elements. 
Such  paths  have  at  least  n — t binary  choice  points,  so  there  are  at  least  2n-t  of  them. 
Thus,  each  of  the  n—  choices  for  the  largest  t — 1 elements  must  appear  in  at  least 
2n_t  leaves  of  the  tree. 

5.  In  fact,  Wt(n)  < Vt(n)  + S(t  — 1),  by  exercise  2. 

6.  Let  g(li,l2,  ...,lm)  — m — 2 + [lg(2!l  + 2‘2  + • • • + 2,m)],  and  assume  that  / = g 

whenever  l\  + h + 1 lm  + 2 m < N.  We  shall  prove  that  f = g when  Zi  + 12  H h 

lm  + 2 m = N.  We  may  assume  that  h > h > • • • > lm.  There  are  only  a few  possible 
ways  to  make  the  first  comparison: 

Strategy  A (j,  k),  for  j < k.  Compare  the  largest  element  of  group  j with  the  largest  of 
group  k.  This  gives  the  relation 

}{h  , . . . , lm  ) ^ 1 “I"  ^(^1?  • • • ? lj  — 1,  lj~\~  1 ? Ij+h  • • • ilk  — 1 ? ^fc+1  , . . . , lm) 

— 9^}  1 5 • • * , Ij  — 1 . Zj,  / j-f  1 , . . . , lk  — 1 , Ij , lk+ 1 , • ■ • , lm)  ^ gi.ll , • • • j lm)‘ 

Strategy  B(j,k),  for  Ik  > 0.  Compare  the  largest  element  of  group  j with  one  of  the 
small  elements  of  group  k.  This  gives  the  relation 

f{l\,. . . ,lm)  < 1 + max(a, /3)  = 1 + /3, 

where 

^ — I?(Zl,  • • • ,lj  — l,Zj+l,  • • • , lm)  Si  g{l  1 j • • • j lm)  li 

S — 1 , lk+1 , • ■ • > lm)  ^ gil  1,  , lm)  1* 

Strategy  C(j,  k),  for  j < k,  lj  > 0,  Ik  > 0.  Compare  a small  element  from  group  j with 
a small  element  from  group  k.  The  corresponding  relation  is 

fih  , . . . , lm)  ^ 1 H"  ^(^l)  • • • j Ik  — 1 > 1?  ^fc+1  j • • • j lm  ) >g{l i,...  » lm  )• 

The  value  of  f(li, . . . , l,n)  is  found  by  taking  the  minimum  right-hand  side  over  all 
these  strategies;  hence  f(li, ...,  lm)  > g{h,  ■ • • ,lm).  When  m > 1,  Strategy  A(m— 1,  m) 
shows  that  f (Zl , . . . ,lm)  S gifl , • • • , lm),  since  gill , . . . , lm— 1 , lm)  = gill , • • • , lm— 1 , lm— 1 ) 
when  h > ■ ■ ■ > lm-  iProof:  [lg(M  + 2“)]  = [lg(M  + 26)]  for  0 < a < b,  when  M is  a 
positive  multiple  of  2b.)  When  m = 1,  use  Strategy  C(l,  1). 

[S.  S.  Kislitsyn’s  paper  determined  the  optimum  strategy  A(m— l,m)  and  eval- 
uated /(Z,Z, ...,/)  in  closed  form;  the  general  formula  for  / and  this  simplified  proof 
were  discovered  by  Floyd  in  1970.] 

7.  For  j > 1,  if  j + 1 is  in  a',  Cj  is  1 plus  the  number  of  comparisons  needed  to  select 
the  next  largest  element  of  a'.  Similar  reasoning  applies  if  j + 1 is  in  a";  and  ci  is 
always  0,  since  the  tree  always  looks  the  same  at  the  end. 

8.  In  other  words,  is  there  an  extended  binary  tree  with  n external  nodes  such  that 
the  sum  of  the  distances  to  the  t — 1 farthest  internal  nodes  from  the  root  is  less  than 
the  corresponding  sum  for  the  complete  binary  tree?  The  answer  is  no,  since  it  is  not 
hard  to  show  that  the  kth  largest  element  of  g(a)  is  at  least  [lg(n  — k) J for  all  a. 
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9.  (All  paths  use  six  comparisons,  yet  the  procedure  is  not  optimum  for  y3(5).) 


10.  (Found  manually  by  trial  and  error,  using  exercise  6 to  help  find  fruitful  lines.) 


11.  See  Information  Processing  Letters  3 (1974),  8-12. 

12.  After  discarding  the  smallest  of  {Xi,  X2,  X3,  X4},  we  have  the  configuration* — • 
plus  n — 3 isolated  elements;  the  third  largest  of  them  can  be  found  in  V's(n  — 1)  — l 
further  steps. 
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13.  After  finding  the  median  of  the  first  /(n)  elements,  say  Xj,  compare  it  to  each  of 
the  others;  this  splits  the  elements  into  approximately  n/2  — k less  than  Xj  and  n/2  + k 
greater  than  Xj,  for  some  k.  It  remains  to  find  the  \k\th  largest  or  smallest  element  of 
the  bigger  set,  which  requires  n/2  + 0(\k\  log  n)  further  comparisons.  The  average  value 
of  \k\  (consider  points  uniformly  distributed  in  [0..1])  is  0(l/\/n)  + 0(n/ \J  f(n) ). 
Let  T(n)  be  the  average  number  of  comparisons  when  f(n)  = n2^3;  then  T(n)  — n = 
T(n2^3)  — n2/3  + n/2  + 0(n2^3),  and  the  result  follows. 

It  is  interesting  to  note  that  when  n = 5,  this  method  requires  only  5y|  compar- 
isons on  the  average,  slightly  better  than  the  tree  of  exercise  9. 

14.  In  general,  the  t largest  can  be  found  in  Ut (n)  < Vt(n  — 1)  + 1 comparisons, 
by  finding  the  tth  largest  of  {Xi, . . . ,Xn_i}  and  comparing  it  with  X„,  because  of 
exercise  2.  (Kirkpatrick  actually  proved  that  (12)  is  a lower  bound  for  Ut(n  + 1)  — 1. 
For  larger  t,  an  improved  bound  for  Ut(n)  was  found  by  J.  W.  John,  SICOMP  17 
(1988),  640-647.) 

15.  min(t,n+l  — t).  Assuming  that  t < n + 1 — t,  if  we  don’t  save  each  of  the  first 
t words  when  they  are  first  read  in,  we  may  have  forgotten  the  tth  largest,  depending 
on  the  subsequent  values  still  unknown  to  us.  Conversely,  t locations  are  sufficient, 
since  we  can  compare  a newly  input  item  with  the  previous  tth  largest,  storing  the 
register  if  and  only  if  it  is  greater. 

16.  The  algorithm  starts  with  (a,b,c,d)  — (n,  0, 0, 0)  and  ends  with  (0, 1,1,  n— 2).  If 
the  adversary  avoids  “surprising”  outcomes,  the  only  transitions  possible  after  each 
comparison  are  from  (o,  b,  c,  d)  to  itself  or  to 


(a— 2,  6+1,  c+1,  d), 

if  a > 2; 

(0— 1,  6,  c+1,  d)  or  (a— 1,  6+1,  c,  d), 

if  a > 1; 

(a,  6-1,  c,  d+ 1), 

if  6 > 2; 

(0,  6,  c 1,  d+1), 

if  c > 2. 

It  follows  that  [|a]  + b + c — 2 comparisons  are  needed  to  get  from  ( a,b,c,d ) to 
(0, 1,  l,a+b+c+d-2).  [ Reference : CACM  15  (1972),  462-464.  In  FOCS  16  (1975),  71- 
74,  Pohl  proved  that  the  algorithm  also  minimizes  the  average  number  of  comparisons.] 

17.  Use  (6)  first  for  the  largest,  then  for  the  smallest,  noting  that  [n/2j  of  the 
comparisons  are  common  to  both. 

18.  Vt(n)  < 18n  — 151,  for  all  sufficiently  large  n. 

21.  Step  0.  Build  two  knockout  trees  of  sizes  2k  and  2k~t+1. 

Step  j,  for  1 < j < t.  (At  this  point  we  have  output  the  largest  j — 1 elements.  The 
remaining  elements,  together  with  a set  of  dummy  placeholders  that  each  equal  —00, 
now  appear  in  two  knockout  trees  A and  B,  where  A has  2k  leaves  and  B has  2fe-t+J.) 
Let  a be  the  champion  of  A,  and  assume  that  a has  beaten  a 0,  a 1,  . . . , Ofc_i,  where 
ai  is  a champion  of  2l  elements.  Similarly,  let  b and  bo,  61,  ...,  bk-t+j- 1 be  the 
champion  and  subchampions  of  B.  If  j = t,  output  max(a,  b)  and  stop.  Otherwise, 
“grow”  another  level  at  the  bottom  of  B by  introducing  2A’~t+J  dummies  who  each 
have  lost  their  first  game  to  the  players  of  B.  (Our  strategy  will  be  to  merge  B 
into  A,  if  possible,  by  exchanging  it  with  the  subtree  A!  of  A that  contains  ao,  a 1, 
. . . , ak-t+j',  notice  that  A! , like  the  newly  enlarged  B,  is  a knockout  tree  with  2fe_<+-,+1 
leaves.)  Compare  b to  ak-t+j+i,  then  compare  the  winner  to  ak-t+j+ 2,  etc.,  until  c = 
max(6,aji_(+j-i,.. . ,ak-i)  has  been  found.  Case  1,  b < c:  Output  a and  interchange 
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B with  A! . Case  2,  b = c and  b < a:  Output  a and  interchange  B with  A' . Case  3,  6 = c 
and  b > a:  Output  b.  After  handling  these  three  cases  we  are  left  with  (possibly  new) 
knockout  trees  A and  B in  which  the  champion  of  B has  just  been  output.  Remove  that 
element  from  B and  replace  it  by  — oo,  making  any  necessary  comparisons  to  restore 
the  knockout  tournament  structure  (as  in  tree  selection).  This  completes  Step  j. 

Step  0 makes  2k  — 1 + 2k+1~t  — 1 comparisons,  and  Step  t makes  1.  Steps  1,  2, 
• • . , t — 1 each  make  at  most  k — 1 comparisons,  except  in  Case  2 when  there  might 
be  k.  But  whenever  Case  2 occurs,  we’ll  save  one  comparison  the  next  time  we’re  in 
Case  1 or  Case  2,  because  a0  will  then  be  — oo.  Thus  the  first  t - 1 steps  make  at  most 
(t  — l)(fc  — 1)  + 1 comparisons  altogether. 

By  exercise  3 we  have  Wt(n)  < n + (t  — l)(fc  - 1)  for  all  n < 2k  + 2fc+1-i,  when 
k >t  >2.  If  n > 2k+t— 2,  exercise  4 says  that  Wt(n)  > n—t+  [lg^+t— 2)— ],  which 
is  n—t+(t—l)k+l  if  t > 3.  Thus  the  method  is  optimum  for  2k+t— 2 < n < 2k-\-2k+1~l 
when  k > t > 3.  (Also  for  several  smaller  values  of  n,  if  t is  large.) 

A similar  method,  which  uses  a reserved  element  instead  of  — oo  when  rebuilding  B 
at  the  end  of  steps  1, . . . , t— 2 (see  the  proof  of  (n)),  proves  that  Vt(n)  < n+(t— l)(ifc— 1) 
when  n < 2k  + 2fc+1~‘  + t-2  and  k>t>3.  [See  J.  Algorithms  5 (1984),  557-578.] 

22.  In  general  when  2r  ■ 2k  < n + 2 - t < (2T  + 1)  • 2k  and  t < 2r  < 2 1,  this  procedure 
starting  with  t + 1 knockout  trees  of  size  2k  will  yield  [(t  — l)/2j  fewer  comparisons 
than  (n),  since  at  least  this  many  of  the  comparisons  that  were  used  to  find  the 
minimum  in  (ii)  can  be  “reused”  in  (iii). 

23.  According  to  (15),  the  quantity  Vj-n/2]  (n)/n  is  bounded  below  by  2 as  n — > 00. 
But  D.  Dor  and  U.  Zwick  have  shown  that  the  actual  lower  limit  is  strictly  greater 
than  2,  while  the  upper  limit  is  less  than  2.942  [SICOMP  28  (1999),  1722-1758;  14 
(2001),  312-325],  They  also  have  proved  an  asymptotic  upper  bound 

Van(n)  < ^1  + cclg  ~ + O^aloglog 

which  is  not  extremely  far  from  (15)  when  a is  small  [Combinatorica  16  (1996),  41-58]. 

24.  Since  Wt(n)  = n + 0(t log n)  by  Eq.  (6),  the  statement  in  the  hint  is  surely  true 
when  t < \J nj  In  n.  Suppose  that  statement  holds  for  n,  and  let  u and  v have  ranks 
t-  = [t  — Vt  hi  nj  and  t+  = \t+Vt  Inn]  in  the  first  n of  2n  randomly  ordered  elements. 
(The  smallest  element  has  rank  1.)  Compare  the  other  n elements  to  v,  and  compare 
those  less  than  v also  to  u.  The  probability  ps  that  an  element  x of  rank  t in  the 
first  n has  rank  s overall  is  (“~J)  (2”Jt'5)  /(2”) . The  average  value  of  s is  Y,  sps  = 

this  is  the  average  number  of  elements  < x,  hence  the  average  number  of  comparisons 
to  u is  (n” = t + O(nlogn)1^2.  Let  u and  v have  ranks  s_  and  s+  among  all 
2 n elements,  and  let  T_  = [2t  — \/2t  In  2nJ , T+  = \2t  + \/2fln  2n].  If  s_  < T_  and 
s+  >T+,  we  can  find  the  elements  of  ranks  T_  and  T+  by  selecting  from  the  s+  — s_  + 1 
elements  between  u and  v.  We  will  prove  that  it’s  very  unlikely  to  have  s_  > T_  or 
S-  < T_  — 2 \/n  Inn  or  s+  < T+  or  > T+  + 2\/nh\  n ; therefore  0(n  log  n)1^2  further 
comparisons  will  almost  always  suffice.  The  hint  will  follow  by  induction  on  n if  we 
can  show  that  “very  unlikely”  means  “with  probability  0(n-1-E)  for  all  sufficiently 
large  n.” 

Notice  that  ps+i/ps  = s(n  — s + t)/(s  + 1 — t)(2n  — s)  decreases  as  s increases 
from  f to  n + f,  and  it  is  < 1 if  and  only  if  s > 2 n(t  — 1 )/(n  — 1);  it  is  < 1 — 
|cn-1^2  + 0(n_1)  when  s = s(c)  = 2 1 + ct(n  — f)/n3^2.  Therefore  the  probability  that 
s > s(c)  is  < 2c-1n1/2ps(c)(l  + 0(n_1/2)).  Similarly,  ps-\/ps  < 1 - \cn~1/2  - 0(n_1) 
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when  s = s(c)  = 2t  — 1 — c(t  — l)(n  + 1 — t)/n3^2 , so  s < s(c)  with  probability 
< 2c“1n1/,2pi(c) (l  + 0(n_1/^2)).  In  the  cases  we  need,  the  relevant  values  of  c are 
> .55n3/2(lnn)1/,2£_1/2(rr  — t)~x  for  all  large  n,  and  Stirling’s  approximation  implies 
that  ps(c)  and  p±(c)  are  both 

0(n1^2s~1^2(2n  — s)-1^2)  exp(— 2sc2(n  — t)2/n3  — 2(2  n — s)c2t2/n3) 

< 0(t~ 1^2  exp(— 4t(n  — t)c2/n 2))  < 0(t-1^2n-1'2). 

Thus  the  probability  0(n_1'2(logn)1//2)  is  indeed  very  unlikely.  [A  similar  construction 
appeared  in  CACM  18  (1975),  165-172,  but  the  analysis  was  incorrect.] 

25.  Given  a selection  algorithm  and  a permutation  n of  {1, . . . , n},  let’s  charge  each 
comparison  7T,  :7t,-  to  7r,  if  \m  — t|  > | Hj  — t\;  if  \m  — t\  = |7Tj  — t|,  we  charge  ~ to  each. 
A charge  to  Hi  is  called  useful  if  nt  < Hj  < t or  nt  > 7 Tj  > t;  otherwise  it’s  useless.  Let 
Xk  be  the  total  charge  to  k.  Then  the  total  number  of  comparisons  is  ®i  + • • • + xn. 
Clearly  xt  = 0;  but  Xk  > 1 for  all  k ^ t,  because  every  element  other  than  t has  a 
useful  charge.  We  will  prove  that  Ext+k  + E xt-k.  > 3 for  0 < k < t. 

Let  Afc( 7r)  = [the  first  charge  to  t + k was  useless].  Then  Ak(n)  = 1 — A-k{ h1), 
where  n'  is  like  it  but  with  the  elements  (t  — k,...,t  + k — l,t  + k)  replaced  respectively 
by  (t  — k+1, . . . ,t+k,t  — k).  Therefore  EAj  + EA-*  = 1. 

Let  Bk  (7r)  = [the  first  charge  to  both  t + k and  t — k was  | , and  t + k received 
its  second  charge  before  t — k did].  Also  let  Ck{n)  = [xt+k  > 2 + Ak\.  Then  Bk{n)  < 
Ck(Tr'),  where  n'  is  like  7r  but  with  the  elements  (t  — k,t  — k + 1, . . . ,t  + k — 1)  replaced 
by  (t  + k — l,t  — k, . . . ,t  + k — 2).  Similarly,  B-k{ h)  < C-k( n"),  where  n"  is  obtained 
from  7r  by  changing  (f— fc+1, . . . , t+k— 1,  t+k)  to  (t—k+2, . . . , t+k , t— fc+1).  It  follows 
that  E Bk  < E Ck  and  EB-j  < EC-*,. 

The  proof  is  completed  by  observing  that  xt-k+xt+k  > 2+Ak  + A-k  — Bk—  B-k  + 
Ck  + C-k.  [See  JACM  36  (1989),  270-279,  for  further  results.] 

The  upper  bound  in  (17)  also  has  a matching  lower  bound:  Andrew  and  Frances 
Yao  proved  that  Vt(n)  > n+|f  (In Inn  — lnt  — 9)  for  t > 1 and  n > (8t)184,  in  SICOMP 
11  (1982),  428-447. 

26.  (a)  Let  the  vertices  of  the  two  types  of  components  be  designated  a;  b < c.  The 
adversary  acts  as  follows  on  nonredundant  comparisons:  Case  1,  a: a',  make  an  arbitrary 
decision.  Case  2,  x:b,  say  that  x > b;  all  future  comparisons  y.b  with  this  particular  b 
will  result  in  y > 6,  otherwise  the  comparisons  are  decided  by  an  adversary  for  Ut(n—  1), 
yielding  > 2 + Ut(n  — 1)  comparisons  in  all.  This  reduction  will  be  abbreviated  “let 
b = min;  2 + Ut(n  — 1).”  Case  3,  x:c,  let  c = max;  2 + Ut-i{n  — 1). 

(b)  Let  the  new  types  of  vertices  be  designated  di,  d?  < e;  / < g < h > i.  Case  1, 
a.  a!  or  c:c',  arbitrary  decision.  Case  2,  a:c,  say  that  a < c.  Case  3,  x : b,  let  b = min; 

2 + Ut(n  — 1).  Case  4,  x:d,  let  d = min;  2 + Ut(n  — 1).  Case  5,  x:e,  let  e = max; 

3 + Ut-i(n  — 1).  Case  6,  x:f,  let  / = min;  2 + Ut(n  — 1).  Case  7,  x:g,  let  / and  g — min; 
3 + Ut(n  — 2).  Case  8,  x:h,  let  h — max;  3 + Ut-i(n  — 1).  Case  9,  x :i,  let  i = min; 
2 + Ut(n  — 1). 

(c)  For  t = 1 we  have  Ut(n)  — n — 1,  so  the  inequality  holds.  For  1 < t < n/2  — 1, 
use  induction  and  (b).  For  t = (n  — l)/2,  use  induction  and  (a).  For  t = n/2, 
Ut(n  — 1)  = Ut-i(n  — 1);  use  induction  and  (a). 

27.  (a)  The  height  h satisfies  2h  > 1 > Pr (l)/p  = 1/p. 

(b)  If  r < t,  we  reach  A3  after  at  least  n — |So|  — |To|  = n — |S'o|  — r flips.  The  tth 
largest  element  will  be  either  the  smallest  or  largest  element  of  Q,  and  the  elements  of 
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Q have  not  yet  been  compared  to  each  other,  so  we  will  need  at  least  |Q|  — 1 more  flips. 
If  | S0 1 < q we  have  |Q|  = r,  and  if  not  we  have  |Q|  > |S0|  - |C(y0)|  + l > |S0|  - (q-r)+l; 
so  in  both  cases  at  least  n—q  flips  will  be  made.  There  are  n+l—t  sets  T containing  the 
t — 1 largest  elements  determined  by  a given  leaf,  and  for  every  such  T the  probability 
of  reaching  that  leaf  is  either  zero  or  2_//("),  where  / > n — q is  the  number  of  flips 
corresponding  to  T.  [This  adversary  is  implicit  in  the  paper  of  Bent  and  John,  STOC 
17  (1985),  213-216.] 

(c)  If  t < r,  change  t to  n + 1 — t\  this  will  make  t > r when  r maximizes  the 
right-hand  side,  since  r will  be  0(\/n).  If  it  is  possible  to  reach  A3  with  |C(y)|  > q — r 
for  all  y e To,  the  algorithm  will  make  n — 1 comparisons  to  relate  the  tth  largest 
element  to  all  the  others,  in  addition  to  at  least  (r  — l)(g  — r + 1)  comparisons  that  it 
made  between  5 and  T\{y0}. 

(d)  Choose  r = \\/rn  ] and  q = 2r  - 2.  (It  is  slightly  better  to  let  q = r + 
[y/m  + |J  — 2;  this  choice  maximizes  the  lower  bound  derived  in  (c).) 


SECTION  5.3.4 


1.  (When  m = 2k  — 1 is  odd  it  is  best  to  have  Vk  followed  by  Vk+ i,  Wk+i,  Vk+ 2,  ■ ■ ■ 
instead  of  by  Wk+i,  Vk+ 1,  Wk+ 2,  ...  in  the  diagram.  This  change  is  valid  because  the 
swapped  lines  are  being  compared  to  each  other.) 


(3,5)  odd-even  merge 


Xl  — ( 

Vl  2 1 

r 

ys 1 

H 

, 

V5 z8 

Pratt  eight-sort 


2.  The  increment  h needs  2 — [2 h > n ] levels;  see  the  diagram  above  for  n = 8. 

3.  C(m,m—  1)  = C(m,m ) — 1,  for  m > 1. 

4.  If  T(6)  = 4,  there  would  be  three  comparators  acting  at  each  time,  since  5(6)  = 12. 
But  then  removing  the  bottom  line  and  its  four  comparators  would  give  5(5)  < 8, 
a contradiction.  [The  same  argument  yields  T( 7)  = T(8)  = 6.  Ian  Parberry  has  shown 
by  exhaustive  computer  search  that  T(9)  = T(10)  = 7;  see  Math.  Systems  Theory  24 
(1991),  101-116.] 

5.  Let  /(n)  = /([n/2])  + 1 + flgfw/211,  if  n > 2.  Then  f(n)  = (l  + fig  n\ ) [lgnl/2 
by  induction  on  n. 

6.  We  may  assume  that  each  stage  makes  [n / 2 j comparisons  (extra  comparisons 

can’t  hurt).  Since  T( 6)  = 5,  it  suffices  to  show  that  T(5)  = 5.  After  two  stages  when 
n — 5,  we  cannot  avoid  the  partial  orderings  or  , which  cannot  be  sorted  in 

two  more  stages.  * '• 

7.  Assume  that  the  input  keys  are  (1,2,...,  10}.  The  key  fact  is  that  after  the  first  16 
comparators,  lines  2,  3,  4,  and  6 cannot  contain  8 or  9,  nor  can  they  contain  both  6 
and  7.  (Notice  that  the  modified  network  has  delay  8.) 

8.  Straightforward  generalization  of  Theorem  F. 
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9.  M(3,3)  > 5(6)  - 25(3);  M(4,4)  > 5(8)  - 25(4);  M(5,5)  > 2M(2,3)  + 3 by 
exercise  8;  and  M( 2,3)  > 5(5)  — 5(2)  — 5(3).  Similarly  M( 3,4)  = 8.  But  what  are 
M(3,  5)  and  M(4,  5)? 

10.  The  hint  follows  by  the  method  of  proof  in  Theorem  Z.  Hence  the  number  of  Os 
in  the  even  subsequence  minus  the  number  of  Os  in  the  odd  subsequence  is  ±1  or  0. 

11.  (Solution  by  M.  W.  Green.)  The  network  is  symmetric  in  the  sense  that,  whenever 

Zi  is  compared  to  Zj,  there  is  a corresponding  comparison  of  z2*-i-7  Any 

symmetric  network  capable  of  sorting  a sequence  (20, . . . , 22‘-i)  will  also  sort  the 
sequence  (— z2t_1, . . . , — Zo). 

Batcher  has  observed  that  the  network  will  actually  sort  any  cyclic  shift  ( Zj,Zj+1 , 
. . . , z2‘-l5  z0, . . . , Zj- 1)  of  a bitonic  sequence.  This  is  a consequence  of  the  0-1  principle. 

[These  results  do  not  hold  for  bitonic  sorters  when  the  order  is  not  a power  of  2.  For 
example,  Fig.  52  does  not  sort  (0, 0, 0, 0, 0, 1, 0).  Batcher’s  original  definition  of  bitonic 
sequences  was  more  complicated  and  less  useful  than  the  definition  adopted  here.] 

12.  x V y is  (consider  0-1  sequences),  but  not  x Ay  (consider  (3, 1, 4, 5)  A (6, 7, 8, 2)). 

13.  A perfect  shuffle  has  the  effect  of  replacing  z*  by  Zj,  where  the  binary  representation 
of  j is  that  of  i rotated  cyclically  to  the  right  one  place  (see  exercise  3.4.2-13).  Consider 
shuffling  the  comparators  instead  of  the  lines;  then  the  first  column  of  comparators  acts 
on  the  pairs  z[i]  and  z[i  ® 2r_1],  the  next  column  on  z[i]  and  z[i  © 2’  2] , . . . , the  fth 
column  on  z[i ] and  z[i  © 1],  the  (t  + l)st  column  on  z[i\  and  z[i  © 2r_1]  again,  etc. 
Here  © denotes  exclusive-or  on  the  binary  representation.  This  shows  that  Fig.  57  is 
equivalent  to  Fig.  56;  after  s stages  we  have  groups  of  2s  elements  that  are  alternatively 
sorted  and  reverse-sorted. 

C.  G.  Plaxton  and  T.  Suel  [Math.  Systems  Theory  27  (1994),  491-508]  have  shown 
that  any  such  network  requires  at  least  $2  ( ( log  n ) 2/  log  log  n ) levels  of  delay. 

14.  (a)  Let  yis  = Xja,  yja  = Xis , yk  = xk  for  is  / k / js;  then  yas  = xa.  (b)  This 
is  obvious  unless  the  set  {is,js,it,jt}  has  only  three  distinct  elements;  suppose  that 
is  = it-  Then  if  s < t the  first  s — 1 comparators  have  ( is,js,jt ) replaced,  respectively, 
by  ( js,jt,is ) in  both  (a3)*  and  (a*)3 . (c)  (as)s  = a,  and  a1  = a,  so  we  can  assume 
that  Si  > S2  > ■ • • > Sk  > 1.  (d)  Let  /3  = a[*:j];  then  gp(x i, . . . ,xn)  = ( x,  V Xj)  A 
(<?a(a:i,  ...,Xi,...,Xj,...,x„)  V ga(x  i,  ...,Xj,...,Xi,...,xn)).  Iterating  this  identity 
yields  the  result,  (e)  fa(x)  = 1 if  and  only  if  no  path  in  Ga  goes  from  i to  j where 
Xi  > Xj.  If  a is  a sorting  network,  the  conjugates  of  a are  also;  and  fa(x)  = 0 for  all 
x with  X{  > Xi+ 1.  Take  x = e(“);  this  shows  that  G has  an  arc  from  i to  k\  for  some 
k\  ^ i.  If  ki  ^ i + 1,  x = e(*)  V e(ki ) shows  that  G has  an  arc  from  i or  k\  to  for  some 
k-2  {L  ki }.  If  k2  ^ i + 1,  continue  in  the  same  way  until  finding  a path  in  G from  i 
to  i + 1.  Conversely  if  a is  not  a sorting  network,  let  x be  a vector  with  > xt+1  and 
ga(x)  = 1.  Some  conjugate  a'  has  fa> (x)  = 1,  so  Gai  can  have  no  path  from  i to  i + 1. 
[In  general,  ( xa),  < (xa)j  for  all  x if  and  only  if  Gai  has  an  oriented  path  from  i to  j 
for  all  a1  conjugate  to  a.] 

15.  [1 : 4] [3 : 2]  [1 : 3] [2 : 4] [2:3]. 

16.  The  process  clearly  terminates.  Each  execution  of  step  T2  has  the  effect  of 
interchanging  the  i,th  and  j(/th  outputs,  so  the  result  of  the  algorithm  is  to  permute  the 
output  lines  in  some  way.  Since  the  resulting  (standard)  network  makes  no  change  to  the 
input  (1, 2, . . . , n),  the  output  lines  must  have  been  returned  to  their  original  position. 

17.  Make  the  network  standard  by  the  algorithm  of  exercise  16;  then  by  considering 
the  input  sequence  (1, 2, . . . , n),  we  see  that  standard  selection  networks  must  take  the 
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t largest  elements  into  the  t highest- numbered  lines;  and  a Vt(n)  network  must  take 
the  fth  largest  into  line  n + 1 — t.  Apply  the  zero-one  principle. 

18.  The  proof  in  Theorem  A shows  that  Vt(n)  > (n  — t)  \ lg(f  + 1)]  + [lgf]. 

19.  The  network  [1 : n]  [2 : n] . . . [1 : 3]  [2 : 3]  selects  the  smallest  two  elements  with  2n  — 4 
comparators;  add  [1:2]  for  V2  (n).  The  lower  bounds  come  from  the  proof  of  Theorem  A 
(see  the  previous  answer). 

20.  (a)  First  note  that  V3(n)  > V3(n  — 1)  + 2 when -m 

n > 4:  By  symmetry  the  first  comparator  may  be  •- 1 

assumed  to  be  [l:n];  after  this  must  come  a network  T "•  f* 

to  select  the  third  largest  of  (x2,X3, . . . ,x„),  and  an-  — " M 

other  comparator  touching  line  1.  On  the  other  hand,  J * * ^ ^ ^ M 

^3(5)  < 7,  since  four  comparators  find  the  min  and  <HH> 

max  of  {xi,X2,X3,Xi},  then  we  sort  the  other  three.  — i- i- 

(b)  A subtle  construction  by  M.  W.  Green,  shown  — • *— i-  — 

for  n = 11,  does  the  job.  (Equality  probably  holds.)  — • 11 — 

21.  False;  consider,  for  example,  the  two  networks  [1:2]  [3: 4]  [2: 3]  [1:4]  [1:2]  [3: 4]  and 
[1 : 2] [3 : 4] [2 : 3]  [3 : 4] [1 : 4] [1 : 2] [3:4].  (However,  N.  G.  de  Bruijn  proved  in  Discrete  Math. 
9 (1974),  337,  that  new  comparators  do  not  mess  up  sorting  networks  that  are  primitive 
in  the  sense  of  exercise  36.) 

22.  (a)  By  induction  on  the  length  of  a,  since  Xi  < y,  and  Xj  < yj  implies  that 
Xi  A Xj  < y%  A yj  and  xt  V x3  < y,  V yj . (b)  By  induction  on  the  length  of  a,  since 
(xiAxj)(yi/\yj)  + (xi\/Xj)(yi  V yj)  > x3yi  + Xjyj.  [Consequently  u(xAy)  < v(xaAya), 
an  observation  due  to  W.  Shockley.] 

23.  Let  Xk  = 1 if  and  only  if  Pk  > j,  yk  — 1 if  and  only  if  pk  > j;  then  ( xa)k  = 1 if 
and  only  if  ( pa)k  > j,  etc. 

24.  The  formula  for  l[  is  obvious  and  for  l'j  take  z = x A y as  in  the  hint  and  observe 
that  ( za)i  = (za)j  = 0 by  exercise  21.  Adding  additional  Is  to  z shows  the  existence  of 
a permutation  p with  (pa')j  < C(z),  by  exercise  23.  The  relations  for  ux  and  u'  follow 
by  reversing  the  order. 

25.  (Solution  by  H.  Shapiro.)  Let  p and  q be  permutations  with  (pa)k  — Ik  and 
( qa)k  = Uk  ■ We  can  transform  p into  q by  repeatedly  interchanging  pairs  (i,  i + 1)  of 
adjacent  integers;  such  an  interchange  in  the  input  affects  the  fcth  output  by  at  most  ±1. 

26.  There  is  a one-to-one  correspondence  that  takes  the  element  {pi, . . . , p„)  of  Pna 
into  the  “covering  sequence”  covers  £(1)  covers  . . . covers  x(n\  where  the  x(l>  are 
in  Dncc,  in  this  correspondence,  x^~1>  = x1-*1  V e.^>  if  and  only  if  pj  = i.  For  example, 
(3, 1,4,2)  corresponds  to  the  sequence  (1,1, 1,1)  covers  (1,0, 1,1)  covers  (1,0, 1,0) 
covers  (0,0, 1,0)  covers  (0,0, 0,0).  [Andrew  Yao  observes  that  consequently  it  suffices 
to  test  a sorting  network  on  (^n"2j)  ~ 1 suitably  chosen  permutations.  For  example, 
any  4-network  that  sorts  (4, 1, 2, 3),  (3, 1, 4, 2),  (3,4, 1,  2),  (2, 4, 1, 3),  and  (2, 3,4, 1)  sorts 
everything.  See  exercise  6.5-1;  see  also  exercise  56.] 

27.  The  principle  holds  because  (xa)i  is  the  ith  smallest  element  of  x.  If  x and  y 
denote  different  columns  of  a matrix  whose  rows  are  sorted,  so  that  Xi  < yt  for  all  i, 
and  if  xa  and  ya  denote  the  result  of  sorting  the  columns,  the  stated  principle  shows 
that  ( xa)i  < ( ya)i  for  all  i,  since  we  can  choose  i elements  of  x in  the  same  rows  as  any 
i given  elements  of  y.  [We  have  used  this  principle  to  prove  the  invariance  property  of 
shellsort,  Theorem  5. 2. IK.  Further  exploitation  of  the  idea  appears  in  an  interesting 
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paper  by  David  Gale  and  R.  M.  Karp,  J.  Computer  and  System  Sciences  6 (1972), 
103-115.  The  fact  that  column  sorting  does  not  mess  up  sorted  rows  was  apparently 
first  observed  in  connection  with  the  manipulation  of  tableaux;  see  Hermann  Boerner, 
Darstellung  von  Gruppen  (Springer,  1955),  Chapter  V,  §5.] 

28.  If  {xi1  are  the  t largest  elements,  then  xt1  A ...  A Xit  is  the  tth  largest.  If 

{x^ , . . . , Xit } are  not  the  t largest,  then  xll  A ...  A Xit  is  less  than  the  tth  largest. 

29.  (xi  A yi,  (x2  A yi)  V ( xi  A y2),  ( x3  A yi)  V (x2  A y2)  V (xr  A y3),  yi  V (x3  A y2)  V 
(x2  A y3)  V (xi  A y4),  y2  V (x3  A y3)  V (x2  A y4)  V (xi  A y5),  y3  V (x3  A y4)  V (x2  A y5)  V Xi , 
y4  v (x3  A y5)  V x2,  y5  V x3). 

30.  Applying  the  distributive  and  associative  laws  reduces  any  formula  to  V’s  of  A’s; 
then  the  commutative,  idempotent,  and  absorption  laws  lead  to  canonical  form.  The 
Si  are  precisely  those  sets  S such  that  the  formula  is  1 when  Xj  = [j  € 5]  while  the 
formula  is  0 when  Xj  — [j  € 5']  for  any  proper  subset  S'  of  S. 

31.  (54  = 166.  R.  Church  [Duke  Math.  J.  6 (1940),  732-734]  found  5s  = 7579,  M.  Ward 
[Bull.  Amer.  Math.  Soc.  52  (1946),  423]  found  dg  = 7828352,  and  the  next  values 
are  S7  = 2414682040996,  5a  = 56130437228687557907786  [R.  Church,  Notices  Amer. 
Math.  Soc.  12  (1965),  724;  J.  Berman  and  P.  Kohler,  Mitteilungen  Math.  Seminar 
GieBen  121  (1976),  103-124;  D.  Wiedemann,  Order  8 (1991),  5-6],  The  asymptotic 
formula  S2m  = exp((2,^_1)  ln2  + (™+1)/2m+1  + j^{m  - l)y/m/n  + 0(m_1/2))  has 
been  established  by  A.  D.  Korshunov  and  A.  A.  Sapozhenko,  with  a similar  formula  for 
S2m+i',  see  Russian  Math.  Surveys  58  (2003),  929-1001,  Theorem  1.8. 

32.  Gt+i  is  also  the  set  of  all  strings  dip  where  6 and  ip  are  in  Gt  and  0 C ip  as  vectors 
of  0s  and  Is.  It  follows  that  Gt  is  the  set  of  all  strings  z0  . . . z2t_1  of  0s  and  Is  having 
the  property  that  zt  < Zj  whenever  the  binary  representation  of  i is  “C”  the  binary 
representation  of  j in  the  0-1  vector  sense.  Each  element  z0  . . . z2t_x  of  Gt,  except 
00  ...  0 and  11 ...  1,  represents  a A-V  function  }{x i, . . . , xt)  from  D2t  into  {0, 1},  under 
the  correspondence  f(x i, . . . , Xt)  = 2(xi...xt)2- 

33.  If  such  a network  existed  we  would  have  (xi  A x2)  V (x2  A x3)  V (x3  A X'4)  = 
f(xi  Ax2,  Xi  Vi2,  x3,  x4)  or  /(x i A x3, x2, xj  Vx3, x4)  or  . . . or  f(x i,x2,  x3  Ax4,x3  Vx4) 
for  some  function  /.  The  choices  (xi, x2, x3, x4)  = (x,x,l,0),  (x,0, x,l),  (x,  1,0,  x), 
(l,x,x,0),  (l,x,0,  x),  (0,  l,x,x)  show  that  no  such  / exists. 

34.  Yes;  after  proving  this,  you  are  ready  to  tackle  the  network  for  n — 16  in  Fig.  49 
(unless  you  simply  checked  all  2n  bit  vectors  by  brute  force  using  Theorem  Z). 

35.  Otherwise  the  permutation  in  which  only  i and  i+  1 are  misplaced  would  never  be 
sorted.  Let  Dk  be  the  number  of  comparators  [i : i+k]  in  a standard  sorting  network. 
Then  Di  + 2D2  + D3  > 2(n  — 2)  since  there  must  be  two  comparators  from  {i,  i+1 } to 
{*+2,  *+3),  for  1 < i < n — 3,  as  well  as  [1:2]  and  [n— 1 :n].  Similarly  D\  + 2 D2  + • • • + 

kDk  + {k—  l)Dk+i  H \-D2k-i  > k(n  - k),  a formula  suggested  by  J.  M.  Pollard.  We 

can  also  prove  that  2Di  + D2  > 3n  — 4:  If  we  strike  out  the  first  comparators  of  the  form 
[j : j+1]  for  all  j there  must  be  at  least  one  more  comparator  lying  within  {i,  i-fl,  i+2}, 
for  1 < i < n — 2.  Similarly  kDi  + (k  — 1)D2  + f Dk  > S(k  + l)(n  — k)  + k(k  — 1). 

36.  (a)  Each  adjacent  comparator  reduces  the  number  of  inversions  by  0 or  1,  and 
(n,  n— 1, . . . , 1)  has  (”)  inversions,  (b)  Let  a = /3\p:p+l],  and  argue  by  induction  on 
the  length  of  a.  If  p = i,  then  j > p+  1,  and  ( x/3)p  > ( x(3)j , (x/3)p+i  > ( x/3)j ; hence 
(y/3)p  > ( y/3)j  and  (y/3)p+ 1 > (y/3)j.  If  p = i - 1,  then  either  ( x/3)p  or  (x/3)p+i  is 
> (x(3)j;  hence  either  (y/3)p  or  (y/3)p+i  is  > ( y(3)j . If  p — j — 1 or  j,  the  arguments  are 
similar.  For  other  p the  argument  is  trivial. 
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Notes:  If  a is  a primitive  sorting  network,  so  is  aR  (the  comparators  in  reverse 
order).  For  generalizations  and  another  proof  of  (c),  see  N.  G.  de  Bruijn,  Discrete 
Mathematics  9 (1974),  333-339;  Indagationes  Math.  45  (1983),  125-132.  In  the  latter 
paper,  de  Bruijn  proved  that  a primitive  network  sorts  all  permutations  of  the  multiset 
{ni  • 1, . . . , rim  • m}  if  and  only  if  it  sorts  the  single  permutation  mCm  . . . 1C1 . The 
relation  x A y,  defined  for  permutations  x and  y to  mean  that  there  exists  a standard 
network  a such  that  x — ya,  is  called  Bruhat  order;  the  analogous  relation  restricted 
to  primitive  a is  weak  Bruhat  order  (see  the  answer  to  exercise  5.2.1-44). 

37.  It  suffices  to  show  that  if  each  comparator  is  replaced  by  an  interchange  operation 
we  obtain  a “reflection  network”  that  transforms  (xj,...,xn)  into  (xn, . . . ,Xi).  But 
in  this  interpretation  it  is  not  difficult  to  trace  the  route  of  Xk ■ Note  that  the  per- 
mutation 7t  = (1  2)(3  4) . . . (2n— 1 2n)(2  3)(4  5) . . . (2n— 2 2n— 1)  = (135  ...  2n— 1 
2 n 2n— 2 ...  2)  satisfies  irn  = (1  2n)(2  2n—  1) . . . (n— 1 n).  The  odd-even  transposition 
sort  was  mentioned  briefly  by  H.  Seward  in  1954;  it  has  been  discussed  by  A.  Grasselli 
[IRE  Trans.  EC-11  (1962),  483]  and  by  Kautz,  Levitt,  and  Waksman  [IEEE  Trans. 
C-17  (1968),  443-451].  The  reflective  property  of  this  network  was  introduced  much 
earlier  by  H.  E.  Dudeney  in  one  of  his  “frog  puzzles”  [Strand  46  (1913),  352,  472; 
Amusements  in  Mathematics  (1917),  193]. 

38.  Insert  the  elements  i j,  ...,  ijv  into  an  initially  empty  tableau  using  Algorithm 
5.1.41  but  with  one  crucial  change:  Set  PZJ  «—  xt  in  step  13  only  if  xt  / Pt(J^1).  It 
can  be  proved  that  xz  will  equal  P^j- 1)  in  that  step  only  if  xt  + 1 — Pij,  when  the 
inputs  i\  . . An  define  a primitive  sorting  network.  (The  parenthesized  assertions  of  the 
algorithm  need  to  be  modified.)  After  ij  has  been  inserted  into  P,  set  Qst  <—  j as  in 
Theorem  5.1.4A.  After  N steps,  the  tableau  P will  always  contain  (r,r  + l, . . . ,n  — 1)  in 
row  r,  while  Q will  be  a tableau  from  which  the  sequence  i\  . . An  can  be  reconstructed 
by  working  backwards. 

For  example,  when  n = 6 the  sequence  i\  . . An  — 413243543123514  corre- 
sponds to 


1 

2 

3 

41 

5 

l 

4 

5 

8 

13 

2 

3 

4 

5 

r 

2 

6 

7 

15 

~3~ 

4 

5 

, Q = 

3 

9 

12 

r 

4 

5 

10 

11 

5 

14 

The  transpose  of  Q corresponds  to  the  complementary  network  [n— i\  :n— ii+1] . . . 
[n— ijv  :n— iat+1]. 

References:  A.  Lascoux  and  M.  P.  Schiitzenberger,  Comp  tes  Rend  us  Acad.  Sci.  (I) 
295  (Paris,  1982),  629-633;  R.  P.  Stanley,  Eur.  J.  Combinatorics  5 (1984),  359-372; 
P.  H.  Edelman  and  C.  Greene,  Advances  in  Math.  63  (1987),  42-99.  The  diagrams  of 
primitive  sorting  networks  also  correspond  to  arrangements  of  pseudolines  and  to  other 
abstractions  of  two-dimensional  convexity;  see  D.  E.  Knuth,  Lecture  Notes  in  Comp. 
Sci.  606  (1992),  for  further  information. 

39.  When  n = 8,  for  example,  such  a network  must  include  the  comparators 
shown  here;  all  other  comparators  are  ineffective  on  10101010.  Then  lines 
[n/3]  ..  [2n/3]  = 3.. 6 sort  4 elements,  as  in  exercise  37.  (This  exercise  is 
based  on  an  idea  of  David  B.  Wilson.) 

Notes:  There  is  a one-to-one  correspondence  between  minimal-length 
primitive  networks  that  sort  a given  bit  string  and  Young  tableaux  whose 
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shape  is  bounded  by  the  zigzag  path  defined  by  that  bit  string.  Thus,  exercise  38 
yields  a one-to-one  correspondence  between  primitive  networks  of  (n^+1)  comparators 
that  sort  (10)n,/2  and  primitive  networks  of  (n/^+1)  comparators  that  sort  n/2  + 1 
arbitrary  numbers.  If  a primitive  network  sorts  the  bit  string  l"/2!)"/2 , we  can  make  a 
stronger  statement:  All  of  its  “halves,”  consisting  of  the  subnetworks  on  lines  k through 
k + n/2  inclusive,  are  sorting  networks,  for  1 < k < n/2.  (See  also  de  Bruijn’s  theorem, 
cited  in  the  answer  to  exercise  36.) 

40.  This  follows  by  applying  the  tail  inequalities  to  the  interesting  construction  in 
Proposition  7 of  a paper  by  H.  Rost,  Zeitschrift  fur  Wahrscheinlichkeitstheorie  und 
verwandte  Gebiete  58  (1981),  41-53,  setting  6 = 1,  o = j,  and  t = 4n  + y/n  Inn. 

Experiments  show  that  the  expected  time  to  reach  any  primitive  sorting  network  — 
not  necessarily  the  bubble  sort  — is  very  nearly  2n2.  Curiously,  R.  P.  Stanley  and  S.  V. 
Fomin  have  proved  that  if  the  comparators  [ik  : ifc+1]  are  chosen  nonuniformly  in  such  a 
way  that  ik  = j occurs  with  probability  //(”),  the  corresponding  expected  time  comes 
to  exactly  Q)  77^ . 

42.  There  must  exist  a path  of  length  [lgn]  or  more,  from  some  input  to  the  largest 
output  (consider  mn  in  Theorem  A);  when  that  input  is  set  to  oo,  the  comparators 
on  this  path  have  a predetermined  behavior,  and  the  remaining  network  must  be  an 
(n  - l)-sorter.  [IEEE  Trans.  C-21  (1972),  612-613.] 

45.  After  l levels  the  input  X\  can  be  in  at  most  2l  different  places.  After  merging  is 
complete,  x\  can  be  in  n + 1 different  places. 

46.  [J.  Algorithms  3 (1982),  79-88;  the  following  alternative  proof  is  due  to  V.  S. 
Grinberg.]  We  may  assume  that  1 < m < n and  that  every  stage  makes  m comparisons. 
Let  l = \(n  — m)/ 2]  and  suppose  we  are  merging  x\  < ■ ■ ■ < xm  with  y\  < ■ ■ • < yn-  An 
adversary  can  force  [lg(m  + n)]  stages  as  follows:  In  the  first  stage  some  Xj  is  compared 
to  an  element  yk  where  we  have  either  k < l or  k > l + m.  The  adversary  decides  that 
Xj-i  < 3/i  and  Xj+ 1 > yn ; also  that  Xj  > yk  'rik  < l,  Xj  < yk  if  k > l+m.  The  remaining 
task  is  essentially  to  merge  Xj  with  either  yk+i  < • ■ • < yn  and  k < l or  yi  < • • • < yk- i 
and  k > l + m;  so  at  least  min(n-/+l,  l+m)  = f(m  + n)/2]  outcomes  remain.  At  least 
[lg[(m  + n)/2\\  = [lg(m  + n)]  — 1 subsequent  stages  are  therefore  necessary. 

48.  Let  u be  the  smallest  element  of  (xa)j,  and  let  j/0)  be  any  vector  in  Dn  such  that 

(y(-0>)k  = 0 implies  (xa)k  contains  an  element  < u,  (y>0>)k  — 1 implies  ( xa)k  contains 
an  element  > u.  If  a = jd\p\q],  it  is  possible  to  find  a vector  y(1>  satisfying  the  same 
conditions  but  with  a replaced  by  /3,  and  such  that  y(1>\p'.q]  = Starting  with 

(ym)i  = i,  (y(0)h  = o,  we  eventually  reach  y = j/r*  satisfying  the  desired  condition. 

G.  Baudet  and  D.  Stevenson  have  observed  that  exercises  37  and  48  combine  to 
yield  a simple  sorting  method  with  ( nlnn)/k+0(n ) comparison  cycles  on  k processors: 
First  sort  k subfiles  of  size  < \n/k],  then  merge  them  in  k passes  using  the  “odd-even 
transposition  merge”  of  order  k.  [IEEE  Trans.  C-27  (1978),  84-87.] 

49.  Both  (x  V y)  V z and  xty  (yty  z)  represent  the  largest  m elements  of  the  multiset 
x i±J  y W z;  (x  ft  y)  ft  z and  x ft  (y  ft  z)  represent  the  smallest  m.  If  x = y = z = {0, 1}, 
(x  ft  z)  (y  ft  z)  = (x  ft  y)  (x  ft  z)  (y  ft  z)  = {0,0},  but  the  middle  elements 
of  {0,0,0, 1, 1, 1}  are  {0,1}.  Sorting  networks  for  three  elements  and  the  result  of 
exercise  48  imply  that  the  middle  elements  of  x ttl  y l±J  2 may  be  expressed  either  as 
((*  V y)  ft  z)  V (x  ft  y)  or  ((x  ft  y)  V z)  b (*  V y)  or  any  other  formula  obtained  by 
permuting  x,  y,  z in  these  expressions.  (There  seems  to  be  no  symmetrical  formula  for 
the  middle  elements.) 
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50.  Equivalently  by  Theorem  Z,  we  must  find  all  identities  satisfied  by  the  operations 

x V y = min(a:+j/,  1),  x y = max(0,  x+y-1) 

on  rational  values  x,  y in  [0 . . 1],  [This  is  the  operation  of  pouring  as  much  liquid  as 
possible  from  a glass  that  is  x full  into  another  that  is  y full,  as  observed  by  J.  M. 
Pollard.]  All  such  identities  can  be  obtained  from  a system  of  four  axioms  and  a rule  of 
inference  for  multivalued  logic  due  to  Lukasiewicz;  see  Rose  and  Rosser,  Trans.  Amer. 
Math.  Soc.  87  (1958),  1-53. 

51.  Let  a'  = a[i:j],  and  let  k be  an  index  ^ i,j.  If  (xa)i  < ( xa)k  for  all  x,  then 
(xa')i  < (xa')k;  if  (xa)k  < ( xa)i  and  (xa)k  < (xa)j  for  all  x,  the  same  holds  when  a is 
replaced  by  a';  if  ( xa)k  < (xa) , for  all  x,  then  (xa')k  < (xa')j.  In  this  way  we  see  that 
a has  at  least  as  many  known  relations  as  a,  plus  one  more  if  [i:j]  isn’t  redundant. 
[Bell  System  Tech.  J.  49  (1970),  1627-1644.] 

52.  (a)  Consider  sorting  Os  and  Is;  let  w = x0  + X\  H hxjv-  The  network  fails  if  and 

only  if  io  < f and  xo  = 1 before  the  complete  iV-sort.  If  xo  = 1 at  this  point,  it  must 
have  been  1 initially,  and  for  1 < j < n we  must  have  initially  had  either  X2j-i+2nk  = 1 
for  0 < k < m or  x2j+2nk  = 1 for  0 < fc  < m;  therefore  w > 1 + (m  + l)n  — t.  So  failure 
implies  that  w = t and  xj  = Xj+2nk  for  1 < k < m and  x2]  = x2j-i  for  1 < j < n. 
Furthermore  the  special  subnetwork  must  transform  such  inputs  so  that  X2m+2 n+j  — 1 
for  1 < j < m. 

(b)  For  example,  the  special  subnetwork  for  (j/i  V y2  V ^3)  A (§2  V 1/3  V ^4)  A . . . 
could  be 

[1  + 2n : 2mn  + 2n  + 1][3  + 2n:2mn  + 2 n+  1]  [6  + 2n:2mn  + 2n  + 1] 

[4  + An\2mn  + 2n  + 2][5  + 4n:2mn  + 2n  + 2][8  + 4n:2mn  + 2n  + 2] ..., 

using  X2j-\+2kn  and  x2j+2kn  to  represent  yj  and  y3  in  the  fcth  clause,  and  X2m+2n+k 
to  represent  that  clause  itself. 

53.  Paint  the  lines  red  or  blue  according  to  the  following  rule: 

if  i mod  4 is  then  line  i in  case  (a)  is  and  in  case  (b)  it  is 

0 red  red; 

1 blue  red; 

2 blue  blue; 

3 red  blue. 

Now  observe  that  the  first  t — 1 levels  of  the  network  consist  of  two  separate  networks, 
one  for  the  2<~1  red  lines  and  another  for  the  2t_1  blue  lines.  The  comparators  on 
the  tth  level  complete  a merging  network,  as  in  the  bitonic  or  odd-even  merge.  This 
establishes  the  desired  result  for  k = 1. 

The  red-blue  decomposition  also  establishes  the  case  k = 2.  For  if  the  input  is 
4-ordered,  the  red  lines  contain  2t_1  numbers  that  are  2-ordered,  and  so  do  the  blue 
lines,  so  we  are  left  with 

xoyoyiXiX2y2y3X3  ■ ■ ■ (case  (a))  or  xoXiyoyiX2X3y2y3  ■ ■ ■ (case  (b)) 
after  t — 1 levels;  the  final  result 

(xoAy0)(x0\/yo){yiAxi)(yi  Vxj) ...  or  x0(xi  Ay0){xi\Zy0)(yi  Aa:2)(i/i  Vx2)  • • • 
is  clearly  2-ordered. 
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Now  for  k > 2,  we  can  assume  that  k < t.  The  first  t — k + 2 levels  decompose 
into  2k~2  separate  networks  of  size  2t_fc+2,  which  each  are  2-ordered  by  the  case  k = 2; 
hence  the  lines  are  2 fc_1  -ordered  after  t — k + 2 levels.  The  subsequent  levels  clearly 
preserve  2fc-1-ordering,  because  they  have  a “vertical”  periodicity  of  order  2k~2 . (We 
can  imagine  — oo  on  lines  —1,  —2,  . . . and  +oo  on  lines  2* , 2*  + 1,  . . . .) 

References:  Network  (a)  was  introduced  by  M.  Dowd,  Y.  Perl,  L.  Rudolph, 
and  M.  Saks,  JACM  36  (1989),  738-757;  network  (b)  by  E.  R.  Canfield  and  S.  G. 
Williamson,  Linear  and  Multilinear  Algebra  29  (1991),  43-51.  It  is  interesting  to  note 
that  in  case  (a)  we  have  Dna  = Gt , where  Gt  is  defined  in  exercise  32  [Dowd  et  al., 
Theorem  17];  thus  the  image  of  D„  is  not  enough  by  itself  to  characterize  the  behavior 
of  a periodic  network. 


54.  The  following  construction  by  Ajtai,  Komlos,  and  Szemeredi  [FOCS  33  (1992), 
686-692]  shows  how  to  sort  m3  elements  with  four  levels  of  m2-sorters:  We  may  suppose 
that  the  elements  being  sorted  are  Os  and  Is;  let  the  lines  be  numbered  (a,  b,  c)  = 
am2  + bm  + c for  0 < a,  b,  c < m.  The  first  level  sorts  the  lines  {(a,  b,  (b  + k ) mod  m)  | 
0 < a,  6 < m}  for  0 < k < m\  let  a*,  be  the  number  of  Is  in  the  fcth  group  of  m2  lines. 
The  second  level  sorts  {(a,  b,  k)  \ 0 < a,  b < m}  for  0 <k<m\  the  number  of  Is  in  the 
fcth  group  is  then 


b*  = £ 


i= o 


mod  m T 3 


and  it  follows  that  60  < 61  + 1,  61  < ba  + 1,  . . . , bm-i  < bo  + 1.  In  the  third  level  we 
sort  {(k,  a,b)  \ 0 < o,  b < m}  for  0 < k < m;  the  number  of  Is  in  the  fcth  group  is 


Cfc 


= ££ 

t=0  j=0  L 


bi  + km  + j 


If  0 < Ck+i  < m2  we  have  c*,  < (m2~1)  and  Cj  = 0 for  j < k.  Similarly,  if  0 < c*,  < m 2 
we  have  Ck+i  > m 2 — (m~1)  and  c3-  = 0 for  j > k + 1.  Consequently  a fourth  level  that 
sorts  lines  m2k  — (m^1)  . . m2k  + — 1 for  0 < k < m will  complete  the  sorting. 

It  follows  that  four  levels  of  m-sorters  will  sort  /(m)  = [y/m \3  elements,  and  16 
levels  will  sort  f(f{m))  elements.  This  proves  the  stated  result,  since  f(f(m))  > m2 
when  m > 24.  (The  construction  is  not  “tight,”  so  we  can  probably  do  the  job  with 
substantially  fewer  than  16  levels.) 

[If  P(n)  denotes  the  minimum  number  of  switches  needed  in 
a permutation  network,  it  is  clear  that  P(n ) > [lgn!].  By 
slightly  extending  a construction  due  to  L.  J.  Goldstein  and 
S.  W.  Leibholz,  IEEE  Trans.  EC-16  (1967),  637-641,  one 
can  show  that  P(n)  < P([n/2J)  + P([n/2])  + n — 1,  hence 
P(n)  < B(n)  for  all  n,  where  B(n)  is  the  binary  insertion  function  of  Eq.  5.3.1-(3). 
M.  W.  Green  has  proved  (unpublished)  that  P(5)  = 8.] 

56.  In  fact  we  can  construct  ax  inductively  so  that  xax  = 0fc-1101n~A’_1,  when  x has 
k zeros.  The  base  case,  Qio,  is  empty.  Otherwise  at  least  one  of  the  following  four 
cases  applies,  where  y is  not  sorted:  (1)  x = y 0,  ax  = ay[n—l:n][n—2:n—l] ...  [1:2]. 
(2)  x = yl,  ax  = ay[l:n][2:n] . . . [n— l:n].  (3)  x = Oy,  ax  = ay  [1  :n][l :n— 1] ...  [1:2]. 
(4)  x = ly,  ax  = [1 : 2] [2 : 3] . . . [n— l:n].  The  network  a+  is  obtained  from  a by 

changing  each  comparator  [i:j]  to  [*+l:j+l].  [See  M.  J.  Chung  and  B.  Ravikumar, 
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Discrete  Math.  81  (1990),  1-9.]  This  construction  uses  (")  - 1 comparators;  can  it  be 
done  with  substantially  fewer? 

57.  [See  H.  Zhu  and  R.  Sedgewick,  STOC  14  (1982),  296-302.]  The  stated  delay  time 
is  easily  verified  by  induction.  But  the  problem  of  analyzing  the  recurrence 

A(m,n)  = A(\m/ 2j,  \n/Z\)  + A(\m/2\,  \n/2\)  + [m/2]  + [n/2]  - 1, 

when  A(0,n)  = A(m,  0)  = 0,  is  more  difficult. 

A bitonic  merge  makes  B(m,n)  = C'(m  + n)  comparisons;  see  (15).  Therefore  we 
can  use  the  fact  that  {|m/2j  + [n/2],  [m/2]  + [n/2]}  = {L(m  + n)/2j,  \(m  + n)/ 2]} 
to  show  that  B(m,n)  = B(|m/2j,  [n/2])  + B(\m/ 2],  [n/2])  + [(m  + n)/2j.  Then 
A(m,n ) < B(m,n)  by  induction. 

Let  D(m,  n)  = C{m  + 1,  n + 1)  + C(m,  n)  - C(m  + 1,  n)  - C(m,  n+  1).  We  have 
D(0,  n)  — D(m,  0)  = 1,  and  D[m:  n)  = 1 when  m + n is  odd.  Otherwise  m + n is  even 
and  mn  > 1,  and  we  have  D(m,n)  = D([m/2\,  [n/2j)  - 1.  Consequently  D(m,n)  < 1 
for  all  m,n  > 0. 

The  recurrence  for  A is  equivalent  to  the  recurrence  for  C except  when  m and  n are 
both  odd.  And  in  that  case  we  have  A(m,n)  > C([m/2J,  [n/2])  + C([m/2],  |_n/2j)  + 
[m/2]  + [n/2]  — 1 = C(m,n)  + 1 — D(\m/2\,  [n/2])  > C(m,n)  by  induction. 

Let  l = [lg  min(m,  n)] . On  level  k of  the  even-odd  recursion,  for  0 < k < l,  we  per- 
form 2k  merges  of  the  respective  sizes  (mjk,njk)  = ([(m+j)/2k\ , [(n  + 2k  - l-j)/2k\) 
for  0 < j < 2k . The  cost  of  recursion,  ^([m^/2]  + \njk/ 2]  -l),  is  fk{m)  + fk(n)  - 2fc; 
we  can  write  fk(n)  = ma x(n'k,n  -n'k),  where  n'k  = 2k[n/2k+1  + 1/2]  is  the  multiple 
of  2k  that  is  nearest  to  n/2.  Since  0 < fk(n)  - n/2  < 2fc~1,  the  total  cost  of  recursion 
for  levels  0 to  / - 1 lies  between  |(m  + n)l  - 2l  and  |(m  + n)l. 

Finally,  if  m < n,  the  2l  merges  (rriji,  nji)  on  level  l have  rriji  = 0 for  0 < j < 2 l~m, 
and  rriji  = 1 for  the  other  m values  of  j.  Since  .4(1,  n)  = n,  the  total  cost  of  level  l is 

< Er=nn_1  k/m  = stl  + n. 

Thus  even-odd  merging,  unlike  bitonic  merging,  is  within  0(m  + n)  of  the  opti- 
m™  number  of  comparisons  M(m,n).  Our  derivation  shows  in  fact  that  A(m,n)  = 
Efc=o  (/*(m)  + fk{n)  - 2k)  + 9t(m  + n)  — gi(max(m,  n)),  where  gi(n)  can  be  expressed 
in  the  form  J2lZolk/2‘\  = [n/2l\{n-2l~1{\ri/2l\  + l)). 

58.  If  h[k  + 1]  = h[k]  + 1 and  the  file  is  not  in  order,  something  must  happen  to  it 
on  the  next  pass;  this  decreases  the  number  of  inversions,  by  exercise  5. 2. 2-1,  hence 
the  file  will  eventually  become  sorted.  But  if  h[k  + 1]  > h[k]  + 2 for  1 < k < m,  the 
smallest  key  will  never  move  into  its  proper  place  if  it  is  initially  in  R2 . 

59.  We  use  the  hint,  and  also  regard  KN+1  = KN+ 2 = • • • = 1.  If  Kh[ ij+>  = • • • = 
Kh[m]+j  — 1 at  step  j , and  if  K,  = 0 for  some  i > h[l]  + j , we  must  have  i < h[m ] + j 
since  there  are  fewer  than  n Is.  Suppose  k and  i are  minimal  such  that  h[k]  + j < i < 
h[k  + 1]  + j and  Ki  = 0.  Let  s = h[k  + 1]  + j — i\  we  have  s < h[k  + 1]  — h[k\  < k.  At 
step  j — s.  at  least  1 + 1 Os  must  have  been  under  the  heads,  since  Kt  = Kh[k+ i]+j_s 
was  set  to  zero  at  that  step;  s steps  later,  there  are  at  least  fc+l-s>2  0s  remaining 
between  Khy  1]+j  and  K,  , inclusive,  contradicting  the  minimality  of  i. 

The  second  pass  gets  the  next  n — 1 elements  into  place,  etc.  If  we  start  with  the 
permutation  N N-l  ...  2 1,  the  first  pass  changes  it  to 

IV+l-n  N-n  ...  1 IV+2— n . . . N-l  N, 

since  Kh[1]+j  > Kh[m]+j  whenever  1 < h[l]+j  and  h[m]+j  < IV;  therefore  the  bound 
is  best  possible. 
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60.  Suppose  that  h[k  + 1]  — s > h[k]  and  h[k]  < s;  the  smallest  key  ends  in  position 
Ri  for  i > 1 if  it  starts  in  Rn-S.  Therefore  h[k  + 1]  < 2 h[k]  is  necessary;  it  is  also 
sufficient,  by  the  special  case  t = 0 of  the  following  result: 

Theorem.  If  n = N and  if  K\...Kn  is  a permutation  of  {1,2,  ...,n},  a single 
sorting  pass  will  set  K,  — i for  1 < i < t + 1,  if  h[k  + 1]  < h[k]  + h[k  — i]  + i for 
1 < k < m and  0 < i < t.  (By  convention,  let  h[k]  = k when  k < 0.) 

Proof.  By  induction  on  t;  if  step  t does  not  find  the  key  t + 1 under  the  heads,  we  may 
assume  that  it  appears  in  position  f?h[fc+1]+(_s  for  some  s > 0,  where  h[k  + l]  -s  < h[k ]; 
hence  h[k  — t\  + t — s > 0.  But  this  is  impossible  if  we  consider  step  t — s,  which 
presumably  placed  the  element  t + 1 into  position  Rh[k+i]+t-a  although  there  were  at 
least  t + 1 lower  heads  active.  | 

(The  condition  is  necessary  for  t = 0 and  t = 1,  but  not  for  t — 2.) 

61.  If  the  numbers  {1, . . . ,23}  are  being  sorted,  the  theorem  in  the  previous  exercise 
shows  that  {1,  2,3,4}  find  their  true  destination.  When  Os  and  Is  are  being  sorted  one 
can  verify  that  it  is  impossible  to  have  all  heads  reading  0 while  all  positions  not  under 
the  heads  contain  Is,  at  steps  —2,  —1,  and  0;  hence  the  proof  in  the  previous  exercise 
can  be  extended  to  show  that  {5, 6,  7}  find  their  true  destination.  Finally  {8, . . . , 23} 
must  be  sorted,  by  the  argument  in  exercise  59. 

63.  When  r < m— 2,  the  heads  take  the  string  0P110130170  . . . 012r_1019  into  the  string 

0P+1 lx0130170 . . . 012  -1012  ~x+9;  hence  m — 2 passes  are  necessary.  [When  the  heads 

are  at  positions  1, 2, 3, 5, . . . , 1 + 2m_2,  Pratt  has  discovered  a similar  result:  The  string 
0pla012i>~10126+1-10  . . . 012"-10r3,  1 < a < 2i>_1,  goes  into  0p+1la-1012'>-1012'>+1-10 
. . . 012  _x012  +q,  hence  at  least  m—  [log2  m]  — 1 passes  are  necessary  in  the  worst  case 

for  this  sequence  of  heads.  The  latter  head  sequence  is  of  special  interest  since  it  has 
been  used  as  the  basis  of  a very  ingenious  sorting  device  invented  by  P.  N.  Armstrong 
[see  U.S.  Patent  3399383  (1965)].  Pratt  conjectures  that  these  input  sequences  provide 
the  true  worst  case,  over  all  inputs.] 

64.  During  quicksort,  each  key  K2,  . ■ . , Kn  is  compared  with  Ki;  let  A = {i  \ Kt  < Ki}, 
B = 0 I Ki  > K i}.  Subsequent  operations  quicksort  A and  B independently;  all 
comparisons  Ki : Kj  for  i in  A and  j in  B are  suppressed,  by  both  quicksort  and 
the  restricted  uniform  algorithm,  and  no  other  comparisons  are  suppressed  by  the 
unrestricted  uniform  algorithm. 

In  this  case  we  could  restrict  the  algorithm  even  further,  omitting  cases  1 and  2 so 
that  arcs  are  added  to  G only  when  comparisons  are  explicitly  made,  yet  considering 
only  paths  of  length  2 when  testing  for  redundancy.  Another  way  to  solve  this  problem 
is  to  consider  the  equivalent  tree  insertion  sorting  algorithm  of  Section  6.2.2,  which 
makes  precisely  the  same  comparisons  as  the  uniform  algorithm  in  the  same  order. 

65.  (a)  The  probability  that  Kai  is  compared  with  Kbi  is  the  probability  that  a other 
specified  keys  do  not  lie  between  Kai  and  K\,i ; this  is  the  probability  that  two  numbers 
chosen  at  random  from  {1, 2, . . . , c,  + 2 } are  consecutive,  namely  2/(ci  + 2). 

(b)  The  first  n — 1 values  of  c,  are  zero,  then  come  (n  — 2)  Is,  ( n — 3)  2s, 
etc.;  hence  the  average  is  2J2k=i(n  ~ *0/(fc  + 1)  = 2 5^£=1((n  + 1 )/(k  + 1)  - l)  = 
2(n  + l)(I7n+i  — 1)  — 2 n. 

(c)  The  “bipartite”  nature  of  merging  shows  that  the  restricted  uniform  algorithm 
is  the  same  as  the  uniform  algorithm  for  this  sequence.  The  pairs  involving  vertex  N 
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have  c’s  equal  to  0, 1, . . . , AT— 2,  respectively;  so  the  average  number  of  comparisons  is 
exactly  the  same  as  quicksort. 

66.  No;  when  N — 5 every  pair  sequence  beginning  with  (1, 2) (2,  3)(3, 4)(4, 5)(1, 5)  will 
avoid  at  least  one  subsequent  comparison.  [An  interesting  research  problem:  For  all  N, 
find  a (restricted)  uniform  sorting  method  whose  worst  case  is  as  good  as  possible.] 

67.  Suppose  Ci  — j for  exactly  t3  values  of  i.  For  the  restricted  case  we  need  to  prove 
that  tj/(2  + j)  is  minimized  when  (to,  G , . . . , t/v-2)  = (TV — 1, . . . , 2, 1).  Gil  Kalai 
has  shown  that  the  achievable  vectors  (to,  ti, . . . , t N-2)  are  always  lexicographically 
> (N—l, . . . , 2, 1);  see  Graphs  and  Combinatorics  1 (1985),  65-79. 

68.  An  item  can  lose  at  most  one  inversion  per  pass,  so  the  minimum  number  of  passes 
is  at  least  the  maximum  number  of  inversions  of  any  item  in  the  input  permutation. 
The  bubble  sort  strategy  achieves  this  bound,  since  each  pass  decreases  the  inversion 
count  of  every  inverted  item  by  one  (see  exercise  5. 2. 2-1).  An  additional  pass  may 
be  needed  to  determine  whether  or  not  sorting  is  complete,  but  the  wording  of  this 
exercise  allows  us  to  overlook  such  considerations. 

It  is  perhaps  unfortunate  that  the  first  theorem  in  the  study  of  computational 
complexity  via  automata  established  the  “optimality”  of  a sorting  method  that  is  so 
poor  from  a programming  standpoint!  The  situation  is  analogous  to  the  history  of 
random  number  generation,  which  took  several  backward  steps  when  generators  that 
are  “optimum”  from  one  particular  point  of  view  were  recommended  for  general  use. 
(See  the  comments  following  Eq.  3.3.3-(3g).)  The  moral  is  that  optimality  results  are 
often  heavily  dependent  on  the  abstract  model;  the  results  are  quite  interesting,  but 
they  must  be  applied  wisely  in  practice. 

[Demuth  went  on  to  consider  a generalization  to  an  r-register  machine  (saving  a 
factor  of  r),  and  to  a Turing-like  machine  in  which  the  direction  of  scan  could  oscillate 
between  left-right  and  right-left  at  will.  He  observed  that  the  latter  type  of  machine  can 
do  the  straight  insertion  and  the  cocktail-shaker  sorts;  but  any  such  1-register  machine 
must  go  through  at  least  \(N2  — N)  steps  on  the  average,  since  each  step  reduces  the 
total  number  of  inversions  by  at  most  one.  Finally  he  considered  r-register  random- 
access  machines  and  the  question  of  minimum-comparison  sorting.  These  portions  of 
his  thesis  have  been  reprinted  in  IEEE  Transactions  C-34  (1985),  296-310.] 

SECTION  5.4 

1.  We  could  omit  the  internal  sorting  phase,  but  that  would  generally  be  much  slower 
since  it  would  increase  the  number  of  times  each  piece  of  data  is  read  and  written  on 
the  external  memory. 

2.  The  runs  are  distributed  as  in  (1),  then  Tape  3 is  set  to  R\ . . . R2000000]  R2000001  ■ ■ ■ 
R4000000 ; JZ4000001  • • • E5000000  • After  all  tapes  are  rewound,  a “one-way  merge”  sets  T\ 
and  T2  to  the  respective  contents  of  T3  and  T4  in  (2).  Then  Ti  and  T2  are  merged 
to  T3,  and  the  information  is  copied  back  and  merged  once  again,  for  a total  of  five 
passes.  In  general,  the  procedure  is  like  the  four-tape  balanced  merge,  but  with  copy 
passes  between  each  of  the  merge  passes,  so  one  less  than  twice  as  many  passes  are 
performed. 

3.  (a)  [ logp  5] . (b)  logB  S,  where  B = \JP(T  — P)  is  called  the  “effective  power  of 
the  merge.”  When  T = 2 P the  effective  power  is  P;  when  T = 2 P — 1 the  effective 
power  is  \/P(P  — 1)  = P — | — |P-1  + 0(P-2),  slightly  less  than  \T. 
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4.  \T.  If  T is  odd  and  P must  be  an  integer,  both  \T/ 2]  and  [T/2 J give  the  same 
maximum  value.  It  is  best  to  have  P > T - P,  according  to  exercise  3,  so  we  should 
choose  P — [T/2]  for  balanced  merging. 


SECTION  5.4.1 

1.  087  154  170  426 


503  | 
426  | 


503  oo 
908  oo 
426  653  oo 
612  oo 


2.  The  path  |061| — (512) — (087) — (154) — (561)  would  be  changed  to  |612| — (612) — (512) — 
(154)— (087).  (We  are  essentially  doing  a “bubble  sort”  along  the  path!) 

3.  and  fourscore  our  seven  years/  ago  brought  fathers  forth  on  this/ 

a conceived  continent  in  liberty  nation  new  the  to/  and  dedicated  men 
proposition  that/  all  are  created  equal. 

4.  (The  problem  is  slightly  ambiguous;  in  this  interpretation  we  do  not  clear  the 
internal  memory  until  the  reservoir  is  about  to  overflow.) 


and  fourscore  on  our  seven  this  years/  ago  brought  continent  fathers 
forth  in  liberty  nation  new  to/  a and  conceived  dedicated  men 
proposition  that  the/  all  are  created  equal. 

5.  False;  the  complete  binary  tree  with  P external  nodes  is  defined  for  all  P > 1. 

6.  Insert  “If  T = L0C(X[0])  then  go  to  R2,  otherwise”  at  the  beginning  of  step  R6, 
and  delete  the  similar  clause  from  step  R7. 

7.  There  is  no  output,  and  RMAX  stays  equal  to  0. 

8.  If  any  of  the  first  P actual  keys  were  oo,  their  records  would  be  lost.  To  avoid  oo, 
we  can  make  two  almost-identical  copies  of  the  program;  the  first  copy  omits  the  test 
involving  LASTKEY  in  step  R4,  and  it  jumps  to  the  second  copy  when  RC  5/  0 in  step  R3 
for  the  first  time.  The  second  copy  needs  no  step  Rl,  and  it  never  needs  to  test  RC  in 
step  R3.  (Further  optimization  is  possible  because  of  answer  10.) 

9.  Assume,  for  example,  that  the  current  run  is  ascending,  while  the  next  should  be 
descending.  Then  the  steps  of  Algorithm  R will  work  properly  except  for  one  change: 
In  step  R6,  if  RN(L)  = RN(Q)  > RC,  reverse  the  test  on  KEY(L)  versus  KEY(Q). 

When  RC  changes,  the  key  tests  of  steps  R4  and  R6  should  change  appropriately. 

10.  Let  • j = L0C(X[j]) , and  suppose  we  add  the  unnecessary  assignment  ‘LOSER (-0) 

Q’  at  the  beginning  of  step  R3.  The  mechanism  of  Algorithm  R ensures  that  the  follow- 
ing conditions  are  true  just  after  we’ve  done  that  assignment:  The  values  of  LOSER(-O) , 
. . . , LOSER (-(P  - 1))  are  a permutation  of  {-0,  -1,  . . . , -(P  - 1)};  and  there  exists  a 
permutation  of  the  pointers  {LOSER(-j)  | RN ( LOSER (•/))  = 0}  that  corresponds  to  an 
actual  tournament.  In  other  words,  when  RN(-y')  is  zero,  the  value  of  KEY  (.-j)  is  irrele- 
vant; we  may  permute  such  “winners”  among  themselves.  After  P iterations  all  RN  ( -j ) 
will  be  nonzero,  so  the  entire  tree  will  be  consistent.  (The  answer  to  the  hint  is  “yes.”) 

David  P.  Kanter  observes  that  we  can  go  directly  from  R6  to  R4  as  soon  as  RN (Q)  = 
0,  thereby  avoiding  all  comparisons  that  involve  uninitialized  keys  when  N > P. 

11.  True.  (The  proof  of  Theorem  K notes  that  both  keys  belong  to  the  same  sub- 
sequence.) 
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13.  The  keys  left  in  memory  when  the  first  run  has  ended  tend  to  be  smaller  than 
average,  since  they  didn’t  make  it  into  the  first  run.  Thus  the  second  run  can  output 
more  of  the  smaller  keys. 

14.  Assume  that  the  snow  suddenly  stops  when  the  snowplow  is  at  a random  point  u, 
0 < u < 1,  after  it  has  reached  its  steady  state.  Then  the  next-to-last  run  contains 
(1  + 2m  — u2)P  records,  and  the  last  run  contains  u2P.  Integrating  this  times  du  yields 
an  average  time  of  (2  — |)P  records  in  the  penultimate  run,  | P in  the  last. 

15.  False;  the  last  run  can  be  arbitrarily  long,  whenever  all  records  in  memory  belong 
to  the  same  run  at  the  moment  the  input  is  exhausted  (for  example,  in  a one-pass  sort). 

16.  If  and  only  if  each  element  has  fewer  than  P inversions.  (See  Sections  5.1.1,  5.4.8.) 
The  probability  is  1 when  N < P,  Pn~pP\/N\  when  N > P,  by  considering  inversion 
tables.  (In  actual  practice,  however,  a one-pass  sort  is  not  too  uncommon,  since  people 
tend  to  sort  a file  even  when  they  suspect  it  might  be  in  order,  as  a precautionary 
measure.) 

17.  Exactly  \N/P ] runs,  all  but  the  last  having  length  P.  (The  “worst  case.”) 

18.  Nothing  changes  on  the  second  pass,  since  it  is  possible  to  show  that  the  fcth 
record  of  a run  is  less  than  at  least  P + 1 — k records  of  the  preceding  run,  for  1 < 
k < P.  (However,  there  seems  to  be  no  simple  way  to  characterize  the  result  of  P- way 
replacement  selection  followed  by  P'- way  replacement  selection  when  P'  > P.) 

19.  Argue  as  in  the  derivation  of  (2)  that  h(x,  t ) dx  = KL  dt,  where  this  time  h(x,  t ) = 
/ + Kt  for  all  x,  and  P = LI.  This  implies  x(t)  = Lln((J  + Kt)/I ),  so  that  when 
x(T)  = L we  have  KT  = (e  — 1)1.  The  amount  of  snowfall  since  t = 0 is  therefore 
(e  - 1)LI  = (e  - 1)P. 

20.  As  in  exercise  19,  we  have  (I  + Kt)  dx  = K(L  — x)  dt;  hence  x(t)  — LKt/(I  + Kt). 
The  snow  in  the  reservoir  is  LI  = P = P'  = JPx(t)K  dt  = L(KT  — /ln((J  + KT)/I)); 
hence  KT  — al,  where  a w 2.14619  is  the  root  of  1 + a = e“_1.  The  run  length  is  the 
total  amount  of  snowfall  during  0 < t < T,  namely  LKT  = aP. 

21.  Proceed  as  in  the  text,  but  after  each  run  wait  for  P — P'  snowflakes  to  fall  before 
the  plow  starts  out  again.  This  means  that  h(x(t),t)  is  now  KT\,  instead  of  KT,  where 
Ti  — T is  the  amount  of  time  taken  by  the  extra  snowfall.  The  run  length  is  LKT] , 
x(t)  = L(  1 - e~t/Tl),  P = LKT1e~T/T\  and  P'  = /0T x(t)K dt  = P + LK{T  - Ti).  In 
other  words,  a run  length  of  eeP  is  obtained  when  P'  = (l  — (1  — 9)ee)P,  for  0 < 9 < 1. 

22.  For  0 < t < (k  — 1)T,  dx  ■ h = K dt  ( x(t  + T)  — x(t)),  and  for  (k  — 1 )T  < t < T, 
dx  ■ h — K dt(L  — x(t)),  where  h is  seen  to  be  constantly  equal  to  KT  at  the  position 
of  the  plows.  It  follows  that  for  0 < j < k,  0 < u < 1,  and  t — (k,  — j — u)T, 
we  have  x(t)  = L(  1 — eu~e Fj(u)/ F(n)).  The  run  length  is  LKT,  the  amount  of 
snowfall  between  the  times  that  consecutive  snowplows  leave  point  0 in  the  steady 
state;  P is  the  amount  cleared  during  each  snowplow’s  last  burst  of  speed,  namely 
KT(L  — x(kT))  = LKTe~e/F(n);  and  P'  = f*T  x(t)K  dt  can  be  shown  to  have  the 
stated  form. 

Notes:  It  turns  out  that  the  stated  formulas  are  valid  also  for  k = 0.  When 
k > 1 the  number  of  elements  per  run  that  go  into  the  reservoir  twice  is  P"  = 
1^T  x(t)K  dt,  and  it  is  easy  to  show  that  (run  length)  — P'  + P"  = (e  — 1)P, 
a phenomenon  noticed  by  Frazer  and  Wong.  Is  it  a coincidence  that  the  generating 
function  for  Fk(9)  is  so  similar  to  the  generating  function  in  exercise  5.1.3-11? 


5.4.1 


ANSWERS  TO  EXERCISES  679 


23.  Let  P — pP'  and  q = 1 — p.  For  the  first  T\  units  of  time  the  snowfall  comes 
from  the  qP'  elements  remaining  in  the  reservoir  after  the  first  pP'  have  been  initially 
removed  in  random  order;  and  when  the  old  reservoir  is  empty,  uniform  snow  begins  to 
fall  again.  We  choose  Ti  so  that  LKTi  = qP'.  For  0 < t < Ti,  h(x,  t)  = (p+qt/Ti  )g(x), 
where  g(x)  is  the  height  of  snow  put  into  the  reservoir  from  position  x;  for  T\  < t < T, 
h(x,  t)  = g(x)  + (t  — T\)K . ForO  < t < Tu  g(x(t))  is  (q(T1-t)/T1)g{x(t))  + {T -Tx)K; 
and  for  Ti  < t < T,  g(x{t))  = (T  - t)K.  Hence  h(x(t),  t)  = (T  - TX)K  for  0 < t < T, 
and  x(t)  = L(l  — exp (— t/(T  — Ti))).  The  total  run  length  is  LK(T  — Ti);  the  total 
amount  “recycled”  from  the  reservoir  back  again  (see  exercise  22)  is  LKTi ; and  the 
total  amount  cleared  after  time  T is  P = KT(L  — x(T)). 

So  the  assumptions  of  this  exercise  give  runs  of  length  ( es/s)P  when  the  reservoir 
size  is  (l  + (s  — I)es/s)P.  This  is  considerably  worse  than  the  results  of  exercise  22, 
since  the  reservoir  contents  are  being  used  in  a more  advantageous  order  in  that  case. 

(The  fact  that  h(x(t),  t)  is  constant  in  so  many  of  these  problems  is  not  surprising, 
since  it  is  equivalent  to  saying  that  the  elements  of  each  run  obtained  during  a steady 
state  of  the  system  are  uniformly  distributed.) 

24.  (a)  Essentially  the  same  proof  works;  each  of  the  subsequences  has  runs  in  the 
same  direction  as  the  output  runs,  (b)  The  stated  probability  is  the  probability  that 
the  run  has  length  n + 1 and  is  followed  by  y;  it  equals  (1  — x)n/n\  when  x > y,  and 
it  is  (1  — x)n/n\  — (y  — x)n/n\  when  x < y.  (c)  Induction.  For  example,  if  the  nth 
run  is  ascending,  the  (n  — l)st  was  descending  with  probability  p,  so  the  first  integral 
applies,  (d)  We  find  that  /'( x)  = f(x)  - c - pf(  1 - x)  - qf(x),  then  f"(x)  = -2 pc, 
which  ultimately  leads  to  f(x ) = c(l  - qx  - px2),  c = 6/(3  + p).  (e)  If  p > eq  then 
pex  + qe1~x  is  monotone  increasing  for  0 < x < 1,  and  fg\pex  + qe1~x  — ex^2|  dx  = 
(p  ~ l)(e ~ l)2  < 0-43.  If  q < p < eq  then  pex  + qex~x  lies  between  2 y/pqe  and 
p + qe,  so  fg\pex  + qex~x  — \(p  + qe  + 2v/pqe)|  dx  < |(VP  — x/^e)2  < 0.4;  and  if 
P < q we  may  use  a symmetrical  argument.  Thus  for  all  p and  q there  is  a constant 
C such  that  fj  |pe*  +qe1~x  - C\  dx  < 0.43.  Let  5n{x ) = fn(x)-f(x).  Then  Sn+i(y)  = 
(1  - ev~1)  /q  (pex  + qe1~x  - C)Sn(x ) dx  + p £~v  ev~1+x8n(x)  dx  + q ev~x8n(x ) dx; 
hence  if  5n(y)  < an,  |<5n+i(2/)|  < (1  - ey~x ) ■ 1.43an  < 0.91a„.  (f)  For  all  n > 0, 
(1— x)n/n\  is  the  probability  that  the  run  length  exceeds  n.  (g)  f*  (pex+qe1~x)f(x)  dx  = 
6/(3  + p). 

26.  (a)  Consider  the  number  of  permutations  with  n+r  + 1 elements  and  n left-to-right 
minima,  where  the  rightmost  element  is  not  the  smallest,  (b)  Use  the  fact  that 

n 

.n  — r — 1 J ’ 

by  the  definition  of  Stirling  numbers  in  Appendix  B.  (c)  Add  r + 1 to  the  mean,  using 
the  fact  that  £„>o[n;tr](rc  + r)/(n  + r + 1)1  = 1,  to  get  En>olT ]/(«  + r - 1)!. 

The  formula  in  (b)  is  due  to  P.  Appell,  Archiv  der  Math,  und  Physik  65  (1880), 
171-175.  We  have,  incidentally,  [[”]]  = (r  + k)\  [xk zr\  exf(z\  where  f(z)  = z/2  + 
22/3  + • • • = — 2_1  ln(l  — z)  — 1;  hence  cr  — [ zr ] (r  + 1 + f(z))e^z\  The  number  of 
derangements  of  n objects  having  k cycles,  sometimes  denoted  by  E]>a.k  [["**]];  B«e 
J.  Riordan,  An  Introduction  to  Combinatorial  Analysis  (Wiley,  1958),  §4.4. 

27.  When  P' / P = 2(e~e  — l + 0)/(l  — 26  + 02  + 2 9e~e),  for  0 < 9 < 1,  the  steady-state 
average  run  length  will  be  2P/(1  — 29+62 +29e~e).  [See  Information  Processing  Letters 
21  (1985),  239-243.] 
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Dobosiewicz  has  also  observed  that  we  can  continue  the  replacement  selection 
mechanism  even  longer,  because  we  can  be  inputting  from  the  front  of  the  reservoir 
queue  while  outputting  to  its  rear.  For  example,  if  P'  = .5 P and  we  continue  replace- 
ment selection  until  the  current  run  contains  .209P  records,  the  average  run  length 
increases  from  about  2.55 P to  about  2.61P  with  this  modification.  If  P'  = P and  we 
continue  replacement  selection  until  only  .314P  records  remain  in  the  current  run,  the 
average  run  length  increases  from  eP  to  about  3.034P.  [See  Comp.  J.  27  (1984),  334 
339,  where  an  even  more  efficient  method  called  “merge  replacement”  is  also  presented.] 

28.  For  multiway  merging  there  is  comparatively  little  problem,  since  P stays  constant 
and  records  are  processed  sequentially  on  each  file;  but  when  forming  initial  runs,  we 
would  like  to  vary  the  number  of  records  in  memory  depending  on  their  lengths.  We 
could  keep  a heap  of  as  many  records  as  will  fit  in  memory,  using  dynamic  storage 
allocation  as  described  in  Section  2.5.  M.  A.  Goetz  [Proc.  AFIPS  Joint  Computer 
Conf.  25  (1964),  602-604]  has  suggested  another  approach,  breaking  each  record  into 
fixed-size  parts  that  are  linked  together;  they  occupy  space  at  the  leaves  of  the  tree, 
but  only  the  leading  part  participates  in  the  tournament. 

29.  The  top  2k  loser  nodes  go  into  the  corresponding  host  positions.  The  remaining 
loser  nodes  consist  of  2k  subtrees  of  2n  - 1 nodes  each;  they  are  assigned  to  host  nodes 
in  symmetric  order  — the  leftmost  subtree  into  the  leftmost  host  node,  etc.  [See  K.  Efe 
and  N.  Eleser,  Acta  Informatica  34  (1997),  429-447.] 

30.  Suppose  t of  the  host  nodes  hold  a connected  2n-node  subgraph  of  the  complete 
2rl+fc-node  loser  tree.  That  tree  has  one  node  at  level  0 and  2l~x  nodes  at  level  l for 
1 < l < n + k.  A subtree  rooted  at  level  l > 1 has  2n+fc+1“'  - 1 nodes;  therefore 
the  roots  of  t disjoint  2n-node  subtrees  must  all  be  on  levels  < k.  And  each  of  these 
subtrees  must  contain  at  least  one  node  on  level  k,  because  there  are  only  2k~x  < 2n 
nodes  on  levels  < k.  It  follows  that  t < 2k~x.  But  the  number  of  edges  in  the  host 
graph  is  at  least  t + 2(2k  — t)  — 1,  by  (ii)  and  (iii),  since  there  are  at  least  this  many 
loser  nodes  whose  parent  has  a different  image  in  the  host. 

[The  hypothesis  n > k is  necessary:  When  n = k — 1 there  is  a suitable  host  graph 
with  2k  + 2k~x  — 2 edges.] 

SECTION  5.4.2 
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2.  After  the  first  merge  phase,  all  remaining  dummies  are  on  tape  T,  and  there  are 
at  most  an  — an—i  < ffln-i  of  them.  Therefore  they  all  disappear  during  the  second 
merge  phase. 

3.  We  have  (D[l]  , D [2] , . . . ,D[T])  = (an—  a„-p,  an— an~p+i,  ■ . ■ , an— a„),  so  the 
condition  follows  from  the  fact  that  the  a’s  are  nondecreasing.  The  condition  is 
important  to  the  validity  of  the  algorithm,  since  steps  D2  and  D3  never  decrease 
D Lj  + 1]  more  often  than  D[j] . 

4.  (1-z z5)a(z)  = 1 because  of  (3).  And  t(z)  = En>i (an+b„+c„+dn+en)zn  = 

(z  + ---  + zs)a(z)  + (z  + ...  + z4)a(z)  + • • • + za{z)  = (5z  + 4z2  + 3z3  + 2z4  + z5)a(z). 

5.  Let  gp(z)  = (z-l)fp(z)  = zp+1 -2zp  + l,  and  let  hp(z)  = zp+1-2zp.  Rouche’s  the- 
orem [J.  Ecole  Polytechnique  21, 37  (1858),  1-34]  tells  us  that  hp(z)  and  gp(z)  have  the 
same  number  of  roots  inside  the  circle  |z|  = 1+e,  provided  \hp{z)\  > \hp(z) -gp(z)\  = 1 
on  the  circle.  If  > e > 0 we  have  \hp(z)\  > (l  + e)p(l-e)  > (1  + </>-1)2(l -0-1)  = 1. 
Hence  gp  has  p roots  of  magnitude  < 1.  They  are  distinct,  since  gcd (gP(z),  g'p(z))  = 
gcd (gP(z),  (p+l)z  - 2 p)  = 1.  [AMM  67  (1960),  745-752.] 

6.  Let  co  = —ap(a  1)/q'(a  x).  Then  p(z)/q(z)  — co/(l  — az)  is  analytic  in  |z|  < R for 
some  R>  H”1;  hence  [zn]  p{z)/q(z)  = c0an  + 0(R~n).  Thus,  InS  = nlna  + lnco  + 
0((aR)  n);  and  n = (lnS/lna)  + 0(1)  implies  that  0((aR)~n ) = 0(S~e).  Similarly, 
let  Cl  = a2p(a-1)/g'(a-1)2  and  c2  = -ap'(a-')/q' (a-')2  + ap(a~4)q'' (a~x) / q' (a-1)3, 
and  consider  p(z)/q(z)2  — cx/(l  — az)2  — c2/(l  — az). 

7.  Let  ap  = 2x  and  z = — 1/2P+1.  Then  xp+1  = xp  + z,  so  we  have  the  convergent 
series  ap  = 2Efc>0  C~kkp)zk/(1  -kp)  = 2~2~p -p2-2p“1 +0(p22-3p)  by  Eq.  1.2.6-(25). 

Note:  It  follows  that  the  quantity  p in  exercise  6 becomes  approximately  log4  S 
as  p increases.  Similarly,  for  both  Table  5 and  Table  6,  the  coefficient  c approaches 
1/ ((0  + 2)  In  4>)  on  a large  number  of  tapes. 

8.  Evidently  NI0V)  = 1,  = 0 for  m < 0,  and  by  considering  the  different  possi- 

bilities  for  the  first  summand  we  have  N^]  = + • • • + N^_p  when  m>  0.  Hence 

= -fm+p-i-  [Lehrbuch  der  Combinatorik  (Leipzig:  Teubner,  1901),  136-137.] 

9.  Consider  the  position  of  the  leftmost  0,  if  there  is  one;  we  find  K^}  = F%1  . Note: 
There  is  a simple  one-to-one  correspondence  between  such  sequences  of  0s  and  Is  and 
the  representations  of  m + 1 considered  in  exercise  8:  Place  a 0 at  the  right  end  of  the 
sequence,  and  look  at  the  positions  of  all  the  0s. 

10.  Lemma:  If  n = F^p>  + 1-  Fjpj  is  such  a representation,  with  ji  > ■ ■ ■ > jm  > p, 

we  have  n < Fffh.  Proof:  The  result  is  obvious  if  m < p;  otherwise  let  k be  minimal 
with  jk  > jk+i  + 1;  we  have  k < p,  and  by  induction  F)p)  -| 1-  F<p>  < Ffp^  , hence 

n<F.(p)  + -..  + F(p),  <F(p)  Jm 

n ^ ^ Ji-fc-i  - rji+i- 

The  stated  result  can  now  be  proved,  by  induction  on  n.  If  n > 0 let  j be 
maximal  such  that  Fjp>  < n.  The  lemma  shows  that  each  representation  of  n must 
consist  of  F-p)  plus  a representation  of  n - Fyp).  By  induction,  n - Fip)  has  a unique 
representation  of  the  desired  form,  and  this  representation  does  not  include  all  of  the 
numbers  F-  p\ , . . . , Fjp>p+1  because  j is  maximal. 

Notes:  This  number  system  was  implicitly  known  in  14th-century  India  (see 
Section  7.2. 1.7).  We  have  considered  the  case  p = 2 in  exercise  1.2.8-34.  There  is 
a simple  algorithm  to  go  from  the  representation  of  n to  that  of  11  ~ 1 , working  on  the 
sequence  c,  . . .cico  of  0s  and  Is  such  that  n = E F°r  example,  if  p = 3,  we 
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look  at  the  rightmost  digits,  changing  ...  0 to  ...  1,  ...  01  to  ...  10,  ...  Oil  to  . . . 100; 
then  we  “carry”  to  the  left  if  necessary,  replacing  . . . 0111 . . . by  . . . 1000 ....  (See  the 
sequences  of  0s  and  Is  in  exercise  9,  in  the  order  listed.)  A similar  number  system 
has  been  studied  by  W.  C.  Lynch  [Fibonacci  Quarterly  8 (1970),  6-22],  who  found  a 
very  interesting  way  to  make  it  govern  both  the  distribution  and  merge  phases  of  a 
polyphase  sort. 

12.  The  fcth  power  contains  the  perfect  distributions  for  levels  k — 4 through  k,  on 
successive  rows,  with  the  largest  elements  to  the  right. 

13.  By  induction  on  the  level. 

14.  (a)  n(l)  = 1,  so  assume  that  k > 1.  The  law  Tnk  = x)  H hT(n_p)(i:_1) 

shows  that  Tnk  < T(n+i)k  if  and  only  if  < Tn(fc_!).  Let  r be  any  positive 

integer,  and  let  n'  be  minimal  such  that  > Tn'(k-i)\  then  T(n-r)(k-i)  > 

Tn(k-i)  for  all  n > n' , since  this  relation  is  trivial  for  n > n(k  — 1)  + r and  otherwise 
T(n-r)(k-i ) > T(n'-r)(k-i)  > Tn>(k- 1)  > Tn(k-i).  (b)  The  same  argument  with  r = 
n - n'  shows  that  Tn’k’  < Tny  implies  T(n,_j)fc/  < T(n__j)k,  for  all  j > 0;  hence  the 
recurrence  implies  that  < Tin_j)k  for  all  j > 0 and  k > k'.  (c)  Let  £(S)  be 

the  least  n such  that  E„(S)  assumes  its  minimum  value.  The  sequence  M„  exists  as 
desired  if  and  only  if  C(S)  < l(S  + 1)  for  all  S.  Suppose  n — £(S)  > (,{S  + 1)  = n,  so 
that  E n(S)  < E ni(S)  and  En(5  + 1)  > E„/(S  + 1).  There  is  some  smallest  S'  such  that 
E n(S')  < E n'{S'),  and  we  have  m = E„(S')  — E„(5'  -1)  < En/(S')  — En<(S'  - 1)  = m . 
Then  Tn'k  < S'  < ’Yl'k=\  Tnk',  hence  there  is  some  k'  < m such  that  Tnik>  < Tnk> . 

Similarly  we  have  l = E„(S+l)-En(S)  > En/(S+1)-En/(S)  = hence  Ylk=\  Tn> k > 
S + 1 > J2k= i Tnk  • Since  l'  > m'  > m,  there  is  some  k > m such  that  Tnik  > Tnk- 
But  this  contradicts  part  (b). 

15.  This  theorem  has  been  proved  by  D.  A.  Zave,  whose  article  was  cited  in  the  text. 

16.  D.  A.  Zave  has  shown  that  the  number  of  records  input  (and  output)  is  S logT_1  S+ 
l^logj..!  logT_!  S + O(S). 

17.  Let  T = 3;  4n(i)  = 6k6  + 35a;7  + 56a;8  + • • • , Bn(i)  = a;6  + 15a:7  + 35x8  + • ■ ■ , 
Tn(x)  = 7x6  + 50x7  + 91x8  + 64a:9  + 19x10  + 2xJ1.  The  optimum  distribution  for 
S = 144  requires  55  runs  on  T2,  and  this  forces  a nonoptimum  distribution  for  S = 145. 
D.  A.  Zave  has  studied  near-optimum  procedures  of  this  kind. 

18.  Let  S = 9,  T = 3,  and  consider  the  following  two  patterns. 


Optimum  Polyphase; 

Alternative; 

T1 

T2 

T3 

Cost 

T1 

T2 

T3 

Cost 

0216 

0213 

— 

oh6 

O1!3 

— 

l3 

— 

0223 

6 

l3 

— 

O^3 

6 

— 

l23l 

22 

5 

— 

l^2 

21 

7 

32 

31 

— 

6 

31 

32 

— 

3 

31 

— 

61 

6 

— 

31 

61 

6 

— 

91 

— 

9 

91 

— 

— 

9 

32 

31 

(Still  another  way  to  improve  on  “optimum”  polyphase  is  to  reconsider  where  dummy 
runs  appear  on  the  output  tape  of  every  merge  phase.  For  example,  the  result  of 
merging  0213  with  0213  might  be  regarded  as  210121012l  instead  of  0223.  Thus,  many 
unresolved  questions  of  optimality  remain.) 
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Level 

T1 

T2 

T3 

T4 

Total 

Final  output  on 

0 

1 

0 

0 

0 

1 

T1 

1 

0 

1 

1 

1 

3 

T6 

2 

1 

1 

1 

0 

3 

T5 

3 

1 

2 

1 

1 

5 

T4 

4 

2 

2 

2 

1 

7 

T3 

5 

2 

4 

3 

2 

11 

T2 

6 

4 

5 

4 

2 

15 

T1 

7 

5 

8 

6 

4 

23 

T6 

n 

0>n 

bn 

cn 

dn 

tn 

T (k) 

n + 1 

bn 

Cn  + CLn 

dn  “I-  a,n 

CLn 

tn  4” 

T(fc-l) 

20.  a(z)  = 1/(1  - z2  - z3  - z4),  t(z)  = (32  + 3z2  + 2z3  + z4)/(l  - z2  - z3  - z4), 
5Zn>l  Tn(x)zn  = x(3z  + 3z2  + 2z3  -f  z4)/(l  — x(z2  + z3  + z4)).  D„  = An- 1 + 1, 
Cn  — An-iAn- 2 + 1,  Bn  = An-i  An-2An-3  + 1,  An  = An_2An_3An_4  + 1. 

21.  333343333332322  3333433333323  33334333333  3333433  333323  T5 

22.  tn  — tn- 1 — tn- 2 = —1  + 3[n  mod  3 = 1],  (This  Fibonacci-like  relation  follows  from 

the  fact  that  1 — z2  — 2z3  — z4  = (1  — 1 — aiz)(  1 — uJz),  where  w3  = 1.) 

23.  In  place  of  (25),  the  run  lengths  during  the  first  half  of  the  nth  merge  phase  are  s„, 
and  on  the  second  half  they  are  tn,  where 

S”  — tn  — 2 + tn-3  + Sn— 3 + Sn-4>  tn  — tn- 2 + Sn-2  + Sn  — 3 + Sn_  4. 


Here  we  regard  sn  — tn  — 1 for  n < 0.  [In  general,  if  vn+\  is  the  sum  of  the  first  2r  terms 
of  «„_!+•  • -+vn-p,  we  have  Sn  = tn  = fn_2  + - • •+tn-r+2tn_T._i+tn_r_2  + - • -+tn-p;  if 
un+1  is  the  sum  of  the  first  2r-l,  we  have  sn  = tn_2  + - • •+f„_r_i+Sn-r-i  + - • -+sn_p, 

tn  = tn- 2 + ‘ ' ' + tn-r  + «n-r  + • • • + Sn-P-] 

In  place  of  (27)  and  (28),  An  = (Un-M-iUn-aVn-aUn-aVn-aUn-M-i)  + 1, 
. . . , Dn  — (Un-iVn-i)  + 1,  En  = (Un-2Vn-2Un-3)  + 1;  Vn+1  = {Un-iVn-\Un-2)  + 1, 
Un  = (Vn-2Un-3Vn-3Un-4Vn-4)  + 1. 

25.  I16  l8  — l8 

l12  l4  R 1824 

l8  — 24  R 


R 

s^e1 

81 

8° 

16° 

R 

81 

— 

16l 

161 

8° 

R 

R 

161 

— 

24° 

161 

161 

R 

24°32l 

16° 

16° 

321 

(R) 

26.  When  2n  are  sorted,  n-2n  initial  runs  are  processed  while  merging;  each  half  phase 
(with  a few  exceptions)  merges  2n-2  and  rewinds  2n_1.  When  2n  + 2n~1  are  sorted, 
n-2  + (n  — 1)  • 2 initial  runs  are  processed  while  merging;  each  half  phase  (with  a 
few  exceptions)  merges  2"-2  or  2n_1  and  rewinds  2n_1  + 2"-2. 

27.  It  works  if  and  only  if  the  gcd  of  the  distribution  numbers  is  1.  For  example, 
let  there  be  six  tapes;  if  we  distribute  (a,6,c,d,e)  to  T1  through  T5,  where  a > 
b > c > d > e > 0,  the  first  phase  leaves  a distribution  (a-e,  6-e,  c-e,  d-e,  e),  and 
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gcd(o— e,  6— e,  c—  e,  d— e,  e)  = gcd(a,  6,  c,  d,  e),  since  any  common  divisor  of  one  set  of 
numbers  divides  the  others  too.  The  process  decreases  the  number  of  runs  at  each 
phase  until  gcd(a,  b,  c,  d,  e)  runs  are  left  on  a single  tape. 

[Nonpolyphase  distributions  sometimes  turn  out  to  be  superior  to  polyphase  under 
certain  configurations  of  dummy  runs,  as  shown  in  exercise  18.  This  phenomenon  was 
first  observed  by  B.  Sackman  about  1963.] 

28.  We  get  any  such  (a,  6,  c,  d,  e)  by  starting  with  (1, 0, 0, 0, 0)  and  doing  the  following 
operation  exactly  n times:  Choose  x in  {a,b,c,d,e},  and  add  x to  each  of  the  other 
four  elements  of  (a,  b,  c,  d,  e). 

To  show  that  a+b+c+d+e  < tn.  we  shall  prove  that  ifa>6>c>d>eon  level  n, 
we  always  have  a < a„,  b < bn,  c < cn,  d < dn,  e < en-  The  proof  follows  by  induction, 
since  the  level  n + 1 distributions  are  (b+a,c+a,d+a,e+a,a),  (a+b,c+b,d+b,e+b,b), 
(a+c,b+c,d+c,e+c,c),  (a+d,b+d,c+d,e+d,d),  (a+e,  b+e,  c+e,  d+e,  e). 

30.  The  following  table  has  been  computed  by  J.  A.  Mortenson. 


Level 

T = 5 

CO 

II 

t- 

II 

00 

II 

T = 9 

T = 10 

1 

2 

2 

2 

2 

2 

2 

Mi 

2 

4 

5 

6 

7 

8 

9 

m2 

3 

4 

5 

6 

7 

8 

9 

M3 

4 

8 

8 

10 

12 

14 

16 

m4 

5 

10 

14 

18 

17 

20 

23 

M5 

6 

18 

20 

26 

27 

32 

31 

M6 

7 

26 

32 

46 

47 

56 

42 

Mr 

8 

44 

53 

74 

82 

92 

92 

Ms 

9 

68 

83 

122 

111 

138 

139 

Mg 

10 

112 

134 

206 

140 

177 

196 

Mio 

11 

178 

197 

317 

324 

208 

241 

Mu 

12 

290 

350 

401 

488 

595 

288 

Ml2 

13 

466 

566 

933 

640 

838 

860 

M13 

14 

756 

917 

1371 

769 

1064 

1177 

m14 

15 

1220 

1481 

1762 

2078 

1258 

1520 

M« 

16 

1976 

2313 

4060 

2907 

3839 

1821 

Mi6 

31.  [Random  Structures  & Algorithms  5 (1994),  102-104.]  Kd(n)  = F^d}2  = N^d}d_1. 
We  have  n — d — 1 = ai  + • • ■ + ar  if  the  tree  has  r + 1 leaves  and  the  (k  + l)st  leaf  has 
ajb  — 1 ancestors  distinct  from  the  ancestors  of  the  first  k leaves.  (The  seven  example 
trees  correspond  respectively  to  1 + 1 + 1 + 1,  1 + 1 + 2,  1 + 2 + 1,  1 + 3,  2 + 1 + 1, 

2 + 2,  and  3+1.) 

SECTION  5.4.3 

1.  The  tape-splitting  polyphase  is  superior  with  respect  to  the  average  number  of 
times  each  record  is  processed  (Table  5.4. 2-6),  when  there  are  6,  7,  or  8 tapes. 

2.  The  methods  are  essentially  identical  when  the  number  of  initial  runs  is  a Fibonacci 
number;  but  the  manner  of  distributing  dummy  runs  in  other  cases  is  better  with 
polyphase.  The  cascade  algorithm  puts  1 on  Tl,  then  1 on  T2,  1 on  Tl,  2 on  T2, 

3 on  Tl,  5 on  T2,  etc.,  and  step  C8  never  finds  D\p  — 1]  = M[p  — 1]  when  p — 2. 
In  effect,  all  dummies  are  on  one  tape,  and  this  is  less  efficient  than  the  method  of 
Algorithm  5. 4. 2D. 
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3.  (Distribution  stops  after  putting  12  runs  on  T3  during  Step  (3,3).) 


T1 

T2 

T3 

T4 

T5 

T6 

^26 

I21 

l24 

I14 

l15 



l5 

— 

I12 

1227 

I15 

22412 

84 

6293 

52 

63 

l1 

— 

— 

91 

231 

171 

251 

261 

1001 

— 

— 

— 

— 

— 

4.  Induction.  (See  exercise  5.4.2-28.) 

5.  When  there  are  an  initial  runs,  the  kth  pass  outputs  an-k  runs  of  length  ak,  then 
bn-k  of  length  bk,  etc. 

6‘  /I  1 1 1 1\ 

11110 
1110  0 . 

110  0 0 
\1  0 0 0 0/ 

7.  We  save  e2en_2  + e3en-3  + • ■ ■ + eneo  initial  run  lengths  (see  exercise  5),  which 

may  also  be  written  ax a„_3  + a2a„-4  -\ h a„_2a0;  it  is  [ zn~ 2]  ( A(z )2  - A(z)). 

8.  The  denominator  of  A(z)  has  distinct  roots  and  greater  degree  than  the  numerator, 
hence  A(z)  = J2l3{p)/(1  — pz)p(l  — q'fip))  summed  over  all  roots  p of  q4(p)  = p.  The 
special  form  of  p is  helpful  in  evaluating  q3(p)  and  q4(p). 

9.  The  formulas  hold  for  all  large  n,  by  (8)  and  (12),  in  view  of  the  value  of  qm(  2 sin  8k). 
To  show  that  they  hold  for  all  n we  need  to  know  that  qm-i(z)  is  the  quotient  when 
qr-i(z)qm(z)  is  divided  by  qr(z)  — z,  for  0 < m < r.  This  can  be  proved  either 
by  using  (10)  and  noting  that  cancellations  bring  down  the  degree  of  the  polynomial 
qr-i(z)qm(z)  - qr(z)qm-1(z),  or  by  noting  that  A(z)2  + B(z)2  + • • • + E(z)2  -4  0 as 
z t 00  (see  exercise  5),  or  by  finding  explicit  formulas  for  the  numerators  of  B(z), 
C(z),  etc. 

10.  E{z)  = ri(z)A(z);  D(z)  = r2(z)A(z)  - r^z);  C{z)  = r3{z)A(z)  - r2(z );  B(z)  = 
r4(z)A(z)  - r3(z );  A(z)  = r5(z)A(z)  + 1 - r4(z).  Thus  A{z)  = (l  - r4(z))/(l  - r5(z)). 
[Notice  that  rm(2sin0)  = sin(2m0)/cos 8:  hence  rm(z)  is  the  Chebyshev  polynomial 
(-l)m+1U2m-l(z/2).] 

11.  Prove  that  fm(z)  = qym/2](z)  - rrm/21(z)  and  that  fm(z)fm-i(z)  = 1 - rm(z). 
Then  use  the  result  of  exercise  10.  (This  explicit  form  for  the  denominator  was  first 
discovered  by  David  E.  Ferguson.) 

13.  See  exercise  5. 4.6-6. 

SECTION  5.4.4 

1.  When  writing  an  ascending  run,  first  write  a sentinel  record  containing  —00  before 
outputting  the  run.  (And  a +00  sentinel  should  be  written  at  the  end  of  the  run  as 
well,  if  the  output  is  ever  going  to  be  read  forward,  as  on  the  final  pass.)  For  descending 
runs,  interchange  the  roles  of  —00  and  +00. 

2.  The  smallest  number  on  level  n + 1 is  equal  to  the  largest  on  level  n;  hence  the 
columns  are  nondecreasing,  regardless  of  the  way  we  permute  the  numbers  in  any 
particular  row. 
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3.  In  fact,  during  the  merge  process  the  first  run  on  T2-T6  will  always  be  descending, 
and  the  first  on  T1  will  always  be  ascending.  (By  induction.) 

4.  It  requires  several  “copy”  operations  on  the  second  and  third  phases;  the  approxi- 
mate extra  cost  is  (log2)/(logp)  passes,  where  p is  the  “growth  ratio”  in  Table  5. 4. 2-1. 

5.  If  a is  a string,  let  aR  denote  its  left-right  reversal. 


We  have 

En  — AR_i  + 1, 

Dn  = Ar_2Ar_1  + 1, 

Cn  = An-3AR_2An_x  + 

Bn  = Ar_4Ar_3Ar_2Ar_1  + 1,  and 

A A R ifl  aR  aR  aR  . 1 

-^n  — 5-^n  — 4-^n  — 3-^-n  — 2-^n  — 1 ' 

— Tl  Qm 

where 

Qn  — Qn  — \ (Qn  — 2 + l)(<5n-3  + 2)(Qn-4  + 3)(Qn_5  + 4),  TZ  > 1, 

Qo  = 0,  and  Qn  = e for  n < 0. 

These  strings  An,  Bn,  . . . contain  the  same  entries  as  the  corresponding  strings  in 
Section  5.4.2,  but  in  another  order.  Note  that  adjacent  merge  numbers  always  differ 
by  1.  An  initial  run  must  be  A if  and  only  if  its  merge  number  is  even,  D if  and 
only  if  odd.  Simple  distribution  schemes  such  as  Algorithm  5. 4. 2D  are  not  quite  as 
effective  at  placing  dummies  into  high- merge- number  positions;  therefore  it  is  probably 
advantageous  to  compute  Qn  between  phases  1 and  2,  in  order  to  help  control  dummy 
run  placement. 

6.  =(+1,+l,-l,  +1) 

z/(3)  = (+l,  0,-1,  0) 

2/(2>  = (+1,-1,+1,+1) 
yw  = (-l.+l.+l.+l) 

2/(0)  = ( 1,  0,  0,  0) 


5.4.4 


ANSWERS  TO  EXERCISES  687 


Incidentally,  34  is  apparently  the  smallest  Fibonacci  number  Fn  for  which  polyphase 
doesn’t  produce  the  optimum  read-backward  merge  for  Fn  initial  runs  on  three  tapes. 
This  tree  has  external  path  length  178,  which  beats  polyphase’s  176. 

8.  For  T = 4,  the  tree  with  external  path  length  13  is  not  T-lifo,  and  every  tree  with 
external  path  length  14  includes  a one-way  merge. 

9.  We  may  consider  a complete  (T-l)-ary  tree,  by  the  result  of  exercise  2.3.4.5-6;  the 
degree  of  the  “last”  internal  node  is  between  2 and  T - 1.  When  there  are  (T  - l)9  - m 
external  nodes,  [m/(T  — 2)J  of  them  are  on  level  q — 1,  and  the  rest  are  on  level  q. 

11.  True  by  induction  on  the  number  of  initial  runs.  If  there  is  a valid  distribution 
with  5 runs  and  two  adjacent  runs  in  the  same  direction,  then  there  is  one  with  fewer 
than  S runs;  but  there  is  none  when  5=1. 

12.  Conditions  (a)  and  (b)  are  obvious.  If  either  configuration  in  (4)  is  present,  for 
some  tape  name  A and  some  i < j < k,  node  j must  be  in  a subtree  below  node  i 
and  to  the  left  of  node  k,  by  the  definition  of  preorder.  Hence  the  case  “ j — I”  can’t 
be  present,  and  A must  be  the  “special”  name  since  it  appears  on  an  external  branch. 
But  this  contradicts  the  fact  that  the  special  name  is  supposed  to  be  on  the  leftmost 
branch  below  node  i. 

13.  Nodes  now  numbered  4,  7,  11,  13  could  be  external  instead  of  one-way  merges. 
(This  gives  an  external  path  length  one  higher  than  the  polyphase  tree.) 

15.  Let  the  tape  names  be  A,  B,  and  C.  We  shall  construct  several  species  of  trees, 
botanically  identified  by  their  root  and  leaf  (external  node)  structure: 


Type  r(A) 

Root  A 

Type  s(A,C ) 

Root  A,  no  C 

Type  t(A) 

Root  A,  no  A 

Type  u(A,  C ) 

Root  A,  no  C 

Type  v(A,  C) 

Root  A,  no  C 

Type  w(A,  C) 

Root  A,  no  A 

leaves 

leaves 

leaves,  no  compound  B leaves 
leaves,  no  compound  A leaves 
leaves,  no  compound  C leaves 
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A “compound  leaf”  is  a leaf  whose  sibling  is  not  a leaf.  We  can  grow  a 3-lifo  type  r(A) 
tree  by  first  growing  its  left  subtree  as  a type  s(B,  C ),  then  growing  the  right  subtree  as 
type  r(C).  Similarly,  type  s(A,C)  comes  from  a type  s(B,C)  then  t(C)\  type  u(A,C ) 
from  v(B,C)  and  w(C,B);  type  v(A,C)  from  u(B,C)  and  w(C,A).  We  can  grow  a 
3-lifo  type  t(A)  tree  whose  left  subtree  is  type  u(B.  A)  and  whose  right  subtree  is  type 
s(C,A),  by  first  letting  the  left  subtree  grow  except  for  its  (non-compound)  C leaves 
and  its  right  subtree;  at  this  point  the  left  subtree  has  only  A and  B leaves,  so  we 
can  grow  the  right  subtree  of  the  whole  tree,  then  grow  off  the  A leaves  of  the  left 
left  subtree,  and  finally  grow  the  left  right  subtree.  Similarly,  a type  w(A , C)  tree  can 
be  fabricated  from  a u(B,A)  and  a v(C\  A).  [The  tree  of  exercise  7 is  an  r(A)  tree 
constructed  in  this  manner.] 

Let  r(n), . . . , w(n)  denote  the  minimum  external  path  length  over  all  n-leaf  trees 
of  the  relevant  type,  when  they  are  constructed  by  such  a procedure.  We  have  r(l)  = 
s(l)  = «(1)  = 0,  r{ 2)  = t( 2)  = w(2)  = 2,  f(l)  = u(l)  = w(l)  = s( 2)  = u(2)  = v(2)  = 
oo;  and  for  n > 3, 

r(n)  — n + minfc(s(fc)  + r(n  - k)),  u(n)  = n + min k(v(k)  + w(n  - k)), 

s(n)  —n  + minfc(s(fc)  + t(n  - k)),  v(n)  = n + min k{u(k)  + w(n  - k)), 

t(n)  = n + min  k{u(k)  + s(n  — k)),  w(n)  = n + mink(u(k)  + v(n  — k)). 

It  follows  that  r(n)  < s(n)  < u(n),  s(n ) < v(n),  and  r(n ) < t(n ) < w(n)  for  all  n; 

furthermore  s(2n)  = t(2n  + 1)  — oo.  (The  latter  is  evident  a priori.) 

Let  A(n)  be  the  function  defined  by  the  laws  .4(1)  = 0,  A(2n)  — 2n  + 2A(n), 
A(2n+ 1)  = 2n  + 1 + A(n)  +A(n  + 1);  then  A(2n)  = 2 n + A(n  — 1)  +A(n  + 1)  — (0  or  1) 
for  all  n > 2.  Let  C be  a constant  such  that,  for  4 < n < 8, 

i)  n even  implies  that  w(n)  < A(n)  -(-  Cn  — 1. 

ii)  n odd  implies  that  u(n)  and  v(n)  are  < A(n)  + Cn  — 1. 

(This  actually  works  for  all  C > |.)  Then  an  inductive  argument,  choosing  k to  be 
[n/2j  ± 1 as  appropriate,  shows  that  the  relations  are  valid  for  all  n > 4.  But  A(n)  is 
the  lower  bound  in  (9)  when  T — 3,  and  r(n)  < min (u(n),v(n),w(n)),  hence  we  have 
proved  that  A(n)  < Kj,(n)  < r(n)  < A(n)  + |n  — 1.  [The  constant  | can  be  improved.] 
17.  [The  following  method  was  used  in  the  UNIVAC  III  sort  program,  and  presented 


at  the 

1962  ACM  Sort 

Symposium.] 

Level 

Tl 

T2 

T3 

T4 

T5 

0 

1 

0 

0 

0 

0 

1 

5 

4 

3 

2 

1 

2 

55 

50 

41 

29 

15 

n 

&n 

bn 

Cn 

dn 

Cn 

n+l 

5an  + 46n+ 

4an  +46n-f 

3an  + 36n-f- 

2an  + 26n+ 

Q-n  +bn  + 

“1“  2 dn  “j” 

~|“  2 dn  + 6n 

3Cn  “I-  2 dn  H“  Cn 

2 Cn  + 2 dn  H-  Cn 

Cn  + dn  + 1 

To  get  from  level  n to  level  n + 1 during  the  initial  distribution,  insert  k\  “sublevels” 
with  (4, 4, 3, 2, 1)  runs  added  respectively  to  tapes  (Tl,  T2, . . . , T5),  “sublevels”  with 
(4, 3,3, 2, 1)  runs  added,  £3  with  (3,3, 2, 2, 1),  k±  with  (2, 2, 2, 1, 1),  k$  with  (1, 1, 1, 1, 0), 
where  k\  < an,  k 2 < bn,  £3  < cn,  ki  < dn,  k$  < e„.  [If  (fci, . . . , fc5)  = ( an , . . . , en)  we 
have  reached  level  n + 1.]  Add  dummy  runs  if  necessary  to  fill  out  a sublevel.  Then 
merge  ki  + &2  + + fej  + k5  runs  from  (Tl, . . . , T5)  to  T6,  merge  ki  + ■ ■ ■ + ki  from 

(T1,...,T4)  to  T5,  ...,  merge  ki  from  Tl  to  T2;  and  merge  k\  from  (T2,...,T6) 
to  Tl,  k2  from  (T3, . . . , T6)  to  T2,  . . . , k5  from  T6  to  T5. 
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18.  (Solution  by  M.  S.  Paterson.)  Suppose  record  j is  written  on  the  sequence  of  tape 
numbers  Tj.  At  most  C|r|  records  can  have  a given  sequence  r,  where  C depends  on 
the  internal  memory  size  (see  Section  5.4.8).  Hence  |n|  H 1-  |tjv|  = 0(/VlogT  TV). 


20.  A strongly  T-fifo  tree  has  a T-fifo  labeling  in  which  there  are  no  three  branches 
having  the  respective  forms 


© 
or  A , 

□ 


for  some  tape  name  A and  some  i<j<k<l<s.  Informally,  when  we  grow  on  an  A , 
we  must  grow  on  all  other  A’s  before  creating  any  new  A’s. 
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for  some  fixed  tape  names  A,  B,  C,  D.  Since  all  occurrences  are  replaced  by  the  same 
pattern,  the  lifo  or  fifo  order  makes  no  difference  in  the  structure  of  the  tree. 

Stating  the  condition  in  terms  of  the  vector  model:  Whenever  (y<k+k>  ^ y^k>  or 
k — m)  and  y ^ = —1,  we  have  + • • • + y j1*  + y j0'  = 0. 

23.  (a)  Assume  that  v\  < V2  < • • • < vt',  the  “cascade”  stage 

(i, . . . , i, -ir(i, . . . , i,  -i,  o)^-1 ...  (i,  -i,  o, . . . ,op 

takes  C(v ) into  v.  (b)  Immediate,  since  C(v)k  < C(w)k  for  all  k.  (c)  If  v is  obtained 
in  q stages,  we  have  u — > u(1>  —>•••—►  u(q>  = v for  some  unit  vector  u,  and  some 
other  vectors  u . . . , n*-9-1*.  Hence  it*1'  A C(u),  u ^ < C(C(u)),  . . . , v ■<  C^(u). 
Hence  Vi  + ■ ■ ■ + vt  is  less  than  or  equal  to  the  sum  of  the  elements  of  and 

the  latter  is  obtained  in  cascade  merge.  [This  theorem  generalizes  the  result  of  exercise 
5. 4. 3-4.  Unfortunately  the  concept  of  “stage”  as  defined  here  doesn’t  seem  to  have  any 
practical  significance.] 

24.  Let  y ^ be  a stage  that  reduces  w to  v.  If  we  have  y^  = —1,  y^l~^  — 0, 

...,  = 0,  and  y = —1,  for  some  k < i — 1,  we  can  insert  y ^ between  y ^ 

and  y 6_1).  Repeat  this  operation  until  all  ( — l)’s  in  each  column  are  adjacent.  Then 
if  y-'1  = 0 and  y^~^  7^  0 it  is  possible  to  set  y^  t—  1;  ultimately  each  column 
consists  of  +l’s  followed  by  — l’s  followed  by  0s,  and  we  have  constructed  a stage  that 
reduces  w'  to  v for  some  w'  X w.  Permuting  the  columns,  this  stage  takes  the  form 
(1, . . . , 1,  — 1)“T  ...  (1,  —1, 0, . . . , 0)“2(— 1,0, . . . , O)01.  The  sequence  of  T — 1 relations 

(xi,...,XT)  ■<  {Xl+XT,.  ■ ■ ,XT-1+XT,0) 

A (Xl+XT-I+XT,  ■ ■ ■ ,XT-2+XT-l+XT,XT,0) 

■<  (xi+Xt-2+Xt-1+Xt,  ■ . . , XT -3  + XT -2  + XT -1+ XT,  XT -1+ XT,  XT,  0) 
< ■■■ 

A (x\+X2+X3-\ \-XT,X  3H 1 -XT,..  . ,XT-1+XT,XT,0) 

now  shows  that  the  best  choice  of  the  a’s  is  ar  = vt,  o.t - 1 = vt-  1,  •■.,  a2  = V2, 
ai  = 0.  And  the  result  is  best  if  we  permute  columns  so  that  «!<•■•<  vt- 

25.  (a)  Assume  that  VT-k+ 1 > ■ • ■ > vt  > v 1 > ■ ■ ■ > VT-k  and  use 

(i,---,i,  — 1,0,...,  0),'T_fc+1 ...  (1, . . . , 1,0, . . . , 0,  — 1)“T. 

(b)  The  sum  of  the  l largest  elements  of  Dk(v)  is  ( l — l)sfc  + Sk+i  for  1 < l < T — k. 

(c)  If  v =>  w in  a phase  that  uses  k output  tapes,  we  may  obviously  assume  that 
the  phase  has  the  form  (1, . . . , 1,  —1, 0, . . . ,0)ai . . . (1, . . . , 1,0, . . . ,0,  — l)a,t,  with  each 
of  the  other  T — k tapes  used  as  input  in  each  operation.  Choosing  ai  = vr-k+i , . . . , 
o-k  = vt  is  best,  (d)  See  exercise  22(c).  We  always  have  k\  = 1;  and  k = T — 2 always 
beats  k = T — 1 since  we  assume  that  at  least  one  component  of  v is  zero.  Hence  for 
T = 3 we  have  k\. . . kq  — lq  and  the  initial  distribution  (Fq+i,  Fq,0).  For  T = 4 the 
undominated  strategies  and  their  corresponding  distributions  are  found  to  be 

q = 2 12(3,2,0,0) 

q = 3 121  (5, 3, 3,0);  122  (5, 5, 0,0) 

q = 4 1211(8,8,5,0);  1222  (10,10,0,0);  1212(11,8,0,0) 

q = 5 12121(19,11,11,0);  12222  (20,20,0,0);  12112(21,16,0,0) 

q = 6 122222  (40, 40, 0, 0);  121212  (41, 30, 0, 0) 

q>  7 129-1  (5  • 29~3, 5 • 29-3, 0, 0) 


5.4.5 


ANSWERS  TO  EXERCISES  691 


So  for  T = 4 and  q > 6,  the  minimum-phase  merge  is  like  balanced  merge,  with  a slight 
twist  at  the  very  end  (going  from  (3, 2, 0, 0)  to  (1, 0, 1, 1)  instead  of  to  (0, 0, 2, 1)). 

When  T — 5 the  undominated  strategies  are  l(32)n_12,  l(32)n-13  for  q = 2n  > 2; 
1(32)™_132,  l(32)n_122,  1(32)"-123  for  q = 2n  + 1 > 3.  (The  first  strategy  listed 
has  most  runs  in  its  distribution.)  On  six  tapes  they  are  13  or  14,  142  or  132  or  133, 
1333  or  1423,  then  139-1  for  q > 5. 

SECTION  5.4.5 

1.  The  following  algorithm  is  controlled  by  a table  A[L  — 1]  . . . A [1]  A [0]  that  essen- 
tially represents  a number  in  radix  P notation.  As  we  repeatedly  add  unity  to  this 
number,  the  carries  tell  us  when  to  merge.  Tapes  are  numbered  from  0 to  P. 

01.  [Initialize.]  Set  (A[L  — 1] , . . . , A [0] ) <—  (0,  ...,0)  and  q 4—  0.  (During  this 

algorithm,  q will  equal  (A[L  — 1]  + b A [0] ) mod  T.) 

02.  [Distribute.]  Write  an  initial  run  on  tape  q,  in  ascending  order.  Set  Z <—  0. 

03.  [Add  one.]  If  l = L,  stop;  the  output  is  on  tape  (— L)  mod  T,  in  ascending 
order  if  and  only  if  L is  even.  Otherwise  set  A [Z]  4—  A [/]  +1,  q 4—  (q+1)  mod  T. 

04.  [Carry?]  If  A [Z]  < P,  return  to  02.  Otherwise  merge  to  tape  (q  — Z)  modT, 

set  A [Z]  4 — 0 and  q 4—  (q  + 1)  mod  T,  increase  l by  1,  and  return  to  03.  | 

2.  Keep  track  of  how  many  runs  are  on  each  tape.  When  the  input  is  exhausted,  add 
dummy  runs  if  necessary  and  continue  merging  until  reaching  a situation  with  at  most 
one  run  on  each  tape  and  at  least  one  tape  empty.  Then  finish  the  sort  in  one  more 
merge,  rewinding  some  tapes  first  if  necessary.  (It  is  possible  to  deduce  the  orientation 
of  the  runs  from  the  A table.) 


Op 

TO 

T1 

T2 

Op 

TO 

T1 

T2 

Dist 

— 

A i 

AiAi 

Dist 

D2A\ 

Ax 

a4 

Merge 

d2 

— 

Ar 

Merge 

d2 

— 

a4d2 

Dist 

D2A\ 
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Merge 

— 

a4 

a4 

Merge 

d2 

d2 

— 

Dist 

— 

a4 

A4Ax 

Dist 

d2 

D2A\ 

Ax 

Copy 

— 

A4Dx 

a4 

Merge 

d2d2 

d2 

— 

Copy 

— 

a4 

A4Ax 

Merge 

d2 

— 

a4 

Merge 

d5 

— 

a4 

At  this  point  T2  would  be  rewound  and  a final  merge  would  complete  the  sort. 

To  avoid  useless  copying  in  which  runs  are  simply  shifted  back  and  forth,  we  can 
say  “If  the  input  is  exhausted,  go  to  B7”  at  the  end  of  B3,  and  add  the  following  new 
step: 

B7.  [Do  the  endgame.]  Set  s < 1,  and  go  to  B2  after  repeating  the  following 

operations  until  l = 0:  Set  s'  4—  A [/  — 1,  q] , and  set  q and  r'  to  the  indices 
such  that  A [Z  — l,q']  = —1  and  A [Z  — 1 ,r']  = —2.  (We  will  have  q'  = r,  and 
s'  < A [Z  — 1,  j]  < s'  + 1 for  j ^ q' , j ^ r' .)  If  s'  — s is  odd,  promote  level  Z, 
otherwise  demote  it  (see  below).  Then  merge  to  tape  r,  reading  backwards; 
set  Z Z — 1,  A[Z,<jQ  < 1,  A[Z,r]  «—  s'  + 1,  r <—  r' , and  repeat. 

Here  “promotion”  means  to  repeat  the  following  operation  until  (g  + (— l)s)  mod  T = r: 
Set  p «—  (q  + (— l)s)  mod  T and  copy  one  run  from  tape  p to  tape  q,  then  set  A [Z,  q]  <— 
s+1,  A [Z, p]  —1,  q 4—  p.  And  “demotion”  means  to  repeat  the  following  until 
(q  — (— l)s)  mod  T = r:  Set  p <—  (q  — ( — l)s)  mod  T and  copy  one  run  from  tape  p to 
tape  q,  then  set  A [Z,q]  4—  s,  A[Z,p]  4 1 , q 4—  p.  The  copy  operation  reads  backwards 
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on  tape  p,  hence  it  reverses  the  direction  of  the  run  being  copied.  If  D [p]  >0  when 
copying  from  p to  q,  we  simply  decrease  D [p]  and  increase  D[q]  instead  of  copying. 

[The  basic  idea  is  that,  once  the  input  is  exhausted,  we  want  to  reduce  to  at  most 
one  run  on  each  tape.  The  parity  of  each  nonnegative  entry  A [Z,  j]  tells  us  whether 
a run  is  ascending  or  descending.  The  smallest  S for  which  this  change  makes  any 
difference  is  P3  + 1.  When  P is  large,  the  change  hardly  ever  makes  much  difference, 
but  it  does  keep  the  computer  from  looking  too  foolish  in  some  circumstances.  The 
algorithm  should  also  be  changed  to  handle  the  case  S = 1 more  efficiently.] 

4.  We  can,  in  fact,  omit  setting  A [0,0]  in  step  Bl,  A[/,g]  in  steps  B3  and  B5.  [But 
A[Z,  r]  must  be  set  in  step  B3.]  The  new  step  B7  in  the  previous  answer  does  need  the 
value  of  A [/,</]  (unless  it  explicitly  uses  the  fact  that  q — r,  as  noted  there). 

5.  P2k  - (P  - l)P2fc~2  <S  <P2k  for  some  k > 0. 

SECTION  5.4.6 

1.  [23000480/ (n  + 480)J  n. 

2.  At  the  instant  shown,  all  the  records  in  that  buffer  have  been  moved  to  the  output. 
Step  F2  insists  that  the  test  “Is  output  buffer  full?”  precede  the  test  “Is  input  buffer 
empty?”  while  merging,  otherwise  we  would  have  trouble  (unless  the  changes  of  exer- 
cise 4 were  made). 

3.  No;  for  example,  we  might  reach  a state  with  P buffers  1/P  full  and  P — 1 buffers 
full,  if  file  i contains  the  keys  i,  i + P,  i + 2 P,  . . . , for  1 < i < P.  This  example  shows 
that  2 P input  buffers  would  be  necessary  for  continuous  output  even  if  we  allowed 
simultaneous  reading,  unless  we  reallocated  memory  for  partial  buffers.  [Well,  we 
don’t  really  need  2 P buffers  if  the  blocks  contain  fewer  than  P — 1 records;  but  that  is 
unlikely.] 

4.  Set  up  S sooner  (in  steps  FI  and  F4  instead  of  F3). 

5.  If,  for  example,  all  keys  of  all  files  were  equal,  we  couldn’t  simply  make  arbitrary 
decisions  while  forecasting;  the  forecast  must  be  compatible  with  decisions  made  by  the 
merging  process.  One  safe  way  is  to  find  the  smallest  possible  m in  steps  FI  and  F4, 
namely  to  consider  a record  from  file  C[i]  to  be  less  than  all  records  having  the  same 
key  on  file  C [/]  whenever  i < j.  (In  essence,  the  file  number  is  appended  to  the  key.) 

6.  In  step  Cl  also  set  TAPE  [T  + 1]  «—  T + 1.  In  step  C8  the  merge  should  be  to 
TAPE [p  + 2]  instead  of  TAPE[p  + 1]  . In  step  C9,  set  (TAPE[1]  , . . . ,TAPE[T+1])  <- 
(TAPE[T +1] , . . . ,TAPE[1] ). 

7.  The  method  used  in  Chart  A is  {A\D\)aAoDqA\D\{AoDo{A\D\)3)2Ao,  D\(A\D\)2 
AoDo{A\Di)3 AoDoaAoDoAo,  D\ AoDo(A\D\)3 AqDooiA\D\ Ao,  D\A\D\oiA\D\Aq, 
where  ol  — {AoDo)3 A\DiAoDo{AiDi')2 (^AoDo)7 AiDi^AoDq)3 AiDiAqDo-  The  first 
merge  phase  writes  DoA-iD-iA\D\A^D^AoDoAiD\A\D\AiD^AoDoAiDiAoDo{A\Di)‘l 
on  tape  5;  the  next  writes  A4D4A4D4A1D1  A4D4A0D0A1D1 AxD\ At  on  tape  1;  the 
next,  D13A4D4A4D4A0D0A10  on  tape  4.  The  final  phases  are 

A4D4A4  - — P19A3D3A12  D13A4D4A4  D0A3 

A4  D23A11  P19A3  D13A4  — 

D 23  D 19  D\3  D 22 

A77  

8.  No,  since  at  most  S stop/starts  are  saved,  and  since  the  speed  of  the  input  tape  (not 
the  output  tapes)  tends  to  govern  the  initial  distribution  time  anyway.  The  other  advan- 
tages of  the  distribution  schemes  used  in  Chart  A outweigh  this  minuscule  disadvantage. 
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9.  P = 5,  B = 8300,  B'  = 734,  S = f(3  + 1/P)N/{6P')]  + 1 = 74,  u * 1.094, 
a « 0.795,  /3  « —1.136,  a'  = /?'  = 0;  Eq.  (g)  « 855  seconds,  to  which  we  add  the  time 
for  initial  rewind,  for  a total  of  958  seconds.  The  savings  of  about  one  minute  in  the 
merging  time  does  not  compensate  for  the  loss  of  time  due  to  the  initial  rewinding  and 
tape  changing  (unless  perhaps  we  are  in  a multiprogramming  environment). 

10.  The  rewinds  during  standard  polyphase  merge  involve  about  54  percent  of  the  file 
(the  “pass/phase”  column  in  Table  5.4. 2-1),  and  the  longest  rewinds  during  standard 
cascade  merge  involve  approximately  akan-k/a„  » (4/(2T-  1))  cos2(7r/(4T  - 2))  < ± 
of  the  file,  by  exercise  5. 4. 3-5  and  Eq.  5.4.3-(i3). 

11.  Only  initial  and  final  rewinds  get  to  make  use  of  the  high-speed  feature,  since  the 
reel  is  only  a little  more  than  10/23  full  when  it  contains  the  whole  example  file.  Using 
7r  = (".946  In  S — 1.204],  7r'  = 1/8  in  example  8,  we  get  the  following  estimated  totals 
for  examples  1-9,  respectively: 


1115,  1296,  1241,  1008,  1014,  967,  891,  969,  856. 


12.  (a)  An  obvious  solution  with  4P+4  buffers  simply  reads  and  writes  simultaneously 
from  paired  tapes.  But  note  that  three  output  buffers  are  sufficient:  At  a given 
moment  we  can  be  performing  the  second  half  of  a write  from  one,  the  first  half 
of  a write  from  another,  and  outputting  into  a third.  This  approach  suggests  a 
corresponding  improvement  in  the  input  buffer  situation.  It  turns  out  that  3 P input 
buffers  and  3 output  buffers  are  necessary  and  sufficient,  using  a slightly  weakened 
forecasting  technique.  A simpler  and  superior  approach,  suggested  by  J.  Sue,  adds  a 
“lookahead  key”  to  each  block,  specifying  the  final  key  of  the  subsequent  block.  Sue’s 
method  requires  2 P + 1 input  buffers  and  4 output  buffers,  and  it  is  a straightforward 
modification  of  Algorithm  F.  (See  also  Section  5.4.9.) 

(b)  In  this  case  the  high  value  of  a means  that  we  must  do  between  five  and  six 
passes  over  the  data,  which  wipes  out  the  advantage  of  double-quick  merging.  The  idea 
works  out  much  better  on  eight  or  nine  tapes. 


13.  No;  consider,  for  example,  the  situation  just  before  Ai6Ai6 Ai6Ai6.  But  two 
reelfuls  can  be  handled. 

/0  -p0z  0 z~l\  / (l-p>iz  -p0z  0 z-l\ 

—p>iz  l-piz  -p0z  z-  1 


14.  det 


0 1 — piz  —poz  z — 1 

10  0 0 

\0  0 0 1 J 

15.  The  A matrix  has  the  form 


det 


0 

V o 


0 


1 0 

0 1 J 


/ Bioz 

Buz  ... 

B\nZ 

jBio  + Bn  + • 

• ‘ + Bin  — 1) 

A = 

Bnoz 

Bn\z  . . . 

BnnZ 

1 -z 

(H) 

o o 

..  0 1 

. 0 0 

0 

0 

o o 

Bn0  + Bn\  + • • 

• + Bnn  — 1. 

Therefore 


det(/  — A)  = det 


/ 1 — Biqz  —Buz 


BnQZ  BniZ 

Vo  o 


-B 


l(n  — 1)^ 


-Blnz  \ 


1 Pn(n  — l)Z  BnnZ 

-1  1 / 
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and  we  can  add  all  columns  to  the  first  column,  then  factor  out  (1  — z).  Consequently 
<7q(z)  has  the  form  /iq(z)/(1  — z),  and  = /iq(1)  because  we  have  /iq(1)  / 0 and 
det(7  — A)  7^  0 for  \z\  < 1. 

SECTION  5.4.7 

1.  Sort  from  least  significant  digit  to  most  significant  digit  in  the  number  system 
whose  radices  are  alternately  P and  T — P.  (If  pairs  of  digits  are  grouped,  we  have 
essentially  the  pure  radix  P • (T  — P).  Thus,  if  P = 2 and  T = 7,  the  number  system 
is  “biquinary,”  related  to  decimal  notation  in  a simple  way.) 

2.  If  K is  a key  between  0 and  F„  — 1,  let  the  Fibonacci  representation  of  Fn  — 1 — K 
be  a„-2Fn-i  + • • • + where  the  aj  are  0 or  1,  and  no  two  consecutive  Is  appear. 
After  phase  j,  tape  (j  + 1)  mod  3 contains  the  keys  with  a3  = 0,  and  tape  (j  — 1)  mod  3 
contains  those  with  a3-  = 1,  in  decreasing  order  of  Oj_i  . . . ai. 

[Imagine  a card  sorter  with  two  pockets,  “0”  and  “1”,  and  consider  the  procedure 
of  sorting  F„  cards  that  have  been  punched  with  the  keys  an-2  ■ a i in  n — 2 columns. 
The  conventional  procedure  for  sorting  these  into  decreasing  order,  starting  at  the  least 
significant  digit,  can  be  simplified  since  we  know  that  everything  in  the  “1”  pocket  at 
the  end  of  one  pass  will  go  into  the  “0”  pocket  on  the  following  pass.] 

4.  If  there  were  an  external  node  on  level  2 we  could  not  construct  such  a good  tree. 
Otherwise  there  are  at  most  three  external  nodes  on  level  3,  and  six  on  level  4,  since 
each  external  node  is  supposed  to  appear  on  the  same  tape. 


6.  09,  08,  ... , 00,  19,  ... , 10,  29,  ... , 20,  39,  ... , 30,  40,  41,  ... , 49,  59,  ... , 50,  60, 
61,  ...,  99. 

7.  Yes;  first  distribute  the  records  into  smaller  and  smaller  subfiles  until  obtaining 
one-reel  files  that  can  be  sorted  individually.  This  is  dual  to  the  process  of  sorting 
one-reel  files  and  then  merging  them  into  larger  and  larger  multireel  files. 

SECTION  5.4.8 

1.  Yes.  If  we  alternately  use  ascending  and  descending  order  in  the  selection  tree,  we 
have  in  effect  an  order-P  cocktail-shaker  sort.  (See  exercise  9.) 

2.  Let  Zn  = Yn  — Xn,  and  solve  the  recurrence  for  Zn  by  noting  that 

{N  + 1)NZn+1  = N(N  - 1 )ZN  + N2  + N; 
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ZN  = §(jV+l)+  /n{N  - 1),  for  N > M. 


Now  eliminate  Yjv  and  obtain 


= f(H„+.-«„„)+2(jAT-iiA5) 

2 / M+2  \ / 1 1 \ 3M+4 

3 \ 3 y V(7V+l)iV(Ar-l)  (M+2)(M+1)mJ+  M+2  ’ 


N > M. 


3.  Yes;  find  a median  element  in  O(IV)  steps,  using  a construction  like  that  of 
Theorem  5.3.3L,  and  use  it  to  partition  the  file.  Another  interesting  approach,  due 
to  R.  W.  Floyd  and  A.  J.  Smith,  is  to  merge  two  runs  of  N items  in  O(N)  units  of 
time  as  follows:  Spread  the  items  out  on  the  tapes,  with  spaces  between  them,  then 
successively  fill  each  space  with  a number  specifying  the  final  position  of  the  item  just 
preceding  that  space. 

4.  It  is  possible  to  piece  together  a schedule  for  floors  {1, . . . ,p  + 1}  with  a schedule 
for  floors  {q, . . . , n}:  When  the  former  schedule  first  reaches  floor  p + 1,  go  up  to  floor 
q and  carry  out  the  latter  schedule  (using  the  current  elevator  contents  as  if  they  were 
the  “extras”  in  the  algorithm  of  Theorem  K).  After  finishing  that  schedule,  go  back  to 
floor  p + 1 and  resume  the  previous  schedule. 

5.  Consider  b = 2,  m — 4 and  the  following  behavior  of  the  algorithm: 


Floor  7 


Floor  6 


Floor  5 


Floor  4 


Floor  3 


Floor  2 


Floor  1 


^77- 

5667  4566 


5667 

fl4- 

5667 

/-15 

5566 

/-23  — 

2556 

f 00 

0055 

f-00 

0000 


-^66 

2345 

— Ms — 

1234  2345 

Vu- 


Now  2 (in  the  elevator)  is  less  than  3 (on  floor  3). 

[After  constructing  an  example  such  as  this,  the  reader  should  be  able  to  see  how 
to  demonstrate  the  weaker  property  required  in  the  proof  of  Theorem  K.] 

6.  Let  i and  j be  minimal  with  < b\  and  bj  > b'j.  Introduce  a new  person  who 
wants  to  go  from  i to  j.  This  doesn’t  increase  max(wfc,  dk+i,  1)  or  ma x(bk,b'k)  for  any  k. 
Continue  this  until  bj  = b'j  for  all  j . Now  observe  that  the  algorithm  in  the  text  works 
also  with  b replaced  by  bk  in  steps  K1  and  K3. 

8.  Let  the  number  be  Pn,  and  let  Qn  be  the  number  of  permutations  such  that  Uk  = 1 
for  1 < k < n.  Then  Pn  = QiP„~i  + Q2Pn-2  + • • • + QnPo,  Po  = 1-  It  can  be  shown 
that  Qn  = 3n~2  for  n > 2 (see  below),  hence  a generating  function  argument  yields 


P„zn  = (1  - 3z)/(l  -4 z + 2 z2) 


l + z + 2z2  + 6z3  + 20  z4  + 68z5  + • 


2 Pn  = (2  + V2)n  1 + (2  - v/2)n_1. 
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To  prove  that  Qn  = 3n  , consider  a ternary  sequence  xixq  . . . xn  such  that  x\  — 2, 
xn  = 0,  and  0 < Xk  < 2 for  1 < k < n.  The  following  rule  defines  a one-to-one 
correspondence  between  such  sequences  and  the  desired  permutations  aia % . . .an: 

{max{ji  | ( j < k and  Xj  = 0)  or  j = 1 },  if  Xk  = 0; 
k,  if  xk  — 1; 

min{j  | (j  > k and  Xj  = 2)  or  j = n },  if  Xk  = 2. 

(This  correspondence  was  obtained  by  the  author  jointly  with  E.  A.  Bender.) 

9.  The  number  of  passes  of  the  cocktail-shaker  sort  is  2max(ui,...,u„)  — (0  or  1), 
since  each  pair  of  passes  (left-right-left)  reduces  each  of  the  nonzero  u’s  by  1. 

10.  Begin  with  a distribution  method  (quicksort  or  radix  exchange)  until  one-reel  files 
are  obtained.  And  be  patient. 


SECTION  5.4.9 

1.  | — (x  mod  |)2  revolutions. 

2.  The  probability  that  k = a,iq  and  k + 1 = a,<r  for  fixed  k,  q,  r,  and  i / i'  is 
f(q,r,k)L\L\(PL  — 2L)!/(PL)!,  where 


(k  — l\/fc  — q \ / PL  — k — 1 \ / PL  — k — 1 — L + q 
U-l/Vr-l/l  L-q  A L — r 


-(  fc_1  )l 

'q  + r - 2 

\ / PL  — k — 1\  f 2L  — q — r\ 

\q  + r — 2 J ' 

v 9-1 

A 2 L-q-r)\  L-q 

and 

E i«- 

1 <k<PL 

r\f(q,r,k)  = ^ \q-r\ 

1 <Q,r<L 

ra 

'q+r-2\/2L-q-r\  _ (PL-1\ 

, 9-1  A L-q  ) ~ \2L-l) 

l<qf,r<L 


A2L-1- 


The  probability  that  k = atq  and  k + 1 = a-iiq+i)  for  fixed  k,  q,  and  i is 


and 


»<«/(?)• 


S(k,q) 

1 <k<PL 
1 <q<L 


where  »(M)  » (‘  _ J)  (™_  /); 


[SICOMP  1 (1972),  161-166.] 

3.  Take  the  minimum  in  (5)  over  the  range  2 < m < min(9,n). 

4.  (a)  (0.000725 {VF  + l)2  + 0.014)L.  (b)  Change  “amn  + /3n”  in  formula  (5)  to 
“(0.000725(^/m  + l)2  + 0.014)n.”  [Computer  experiments  show  that  the  optimal  trees 
defined  by  this  new  recurrence  are  very  similar  to  those  defined  by  Theorem  K with 
a = 0.00145,  /3  = 0.01545;  in  fact,  trees  exist  that  are  optimal  for  both  recurrences, 
when  30  < n < 100.  The  change  suggested  in  this  exercise  saves  about  10  percent  of 
the  merging  time,  when  n = 64  or  100  as  in  the  text’s  example.  This  style  of  buffer 
allocation  was  considered  already  in  1954  by  H.  Seward,  who  found  that  four-way 
merging  minimizes  the  seek  time.] 
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5.  Let  Am(n)  and  Bm{n)  be  the  cost  of  optimum  sets  of  m trees  whose  n leaves  are 
all  at  (even,  odd)  levels,  respectively.  Then  Ai(l)  = 0,  Bi  (1)  = a + 0;  Am(n)  and 
Bm(n)  are  defined  as  in  (4)  when  m > 2;  Ai(n)  = mini <m<n(amn  + /3n  + Bm(n)), 
Bi (n)  = mini <rn<n(amn  + /3n  + Arn(n)).  The  latter  equations  are  well-defined  in  spite 
of  the  fact  that  Ai  ( n ) and  Bi  ( n ) are  defined  in  terms  of  each  other! 


Ai  (23)  = B 1 (23)  = 268.  [Curiously,  n = 23  is  the  only  value  < 50  for  which  no  equal- 
parity  tree  with  n leaves  is  optimal  in  the  unrestricted-parity  case.  Perhaps  it  is  the 
only  such  value,  when  a = /?.] 

7.  Consider  the  quantities  ad\  + /3e  1,  . . . , adn  + /3e„  in  any  tree,  where  dj  is  the 
degree  sum  and  ej  is  the  path  length  for  the  jth  leaf.  An  optimum  tree  for  weights 
wi  < ■ • ■ < w„  will  have  ad\  + (ie\  > ■ ■ ■ > adn  + /3en.  It  is  always  possible  to  reorder 
the  indices  so  that  adi  + /3ei  = ■ • • = adk  + fiek  where  the  first  k leaves  are  merged 
together. 

9.  Let  d minimize  (am  + 0)/lnrn.  A simple  induction  using  convexity  shows  that 
A\(n)  > (ad  + /3)nlogd  n,  with  equality  when  n = dl . A suitable  upper  bound  comes 
from  complete  d- ary  trees,  since  these  have  D(T ) = dE(7 ’),  E(T)  = tn  + dr  for 
n — d*  + (d  — l)r,  0 < r < dl . 

10.  See  STOC  6 (1974),  216-229. 

11.  Using  exercise  1.2.4-38,  fm(n)  = 3qn  + 2 (n  — 3 qm),  when  2 • 3,_1  < n/m  < 3 q; 
fm(n)  — 2>qn  + 4 (n  — 3 qm),  when  39  < n/m  < 2 ■ 39.  Thus  fi(n)  + 2 n > }(n),  with 
equality  if  and  only  if  4 ■ 3?_1  < n < 2 • 39;  fs(n)  + 3n  = /(n);  fi(n)  + 4 n > f(n),  with 
equality  if  and  only  if  n = 4 ■ 39;  and  fm(n)  + mn  > f(n)  for  all  m > 5. 

12.  Use  the  specifications  — , 1:1,  1:1:1,  1:1:1:1  or  2:2,  2:3,  2:2:2,  . . . , [n/3j:L(n  + l)/3j: 
[(n  + 2)/3j , . . . ; this  gives  trees  with  all  leaves  at  level  q + 2,  for  4 • 3«  < n < 4 • 39+1. 
(When  n = 4 • 3V , two  such  trees  are  formed.) 

14.  The  following  tree  specifications  were  found  for  n = 1,  2,  3,  . . . by  exhaustively 
examining  all  partitions  of  n : — , 1:1,  1:1:1,  1:1:1:1,  1:1:1:1:1,  1:1:1:1:1:1,  1:1:1:1:3, 
1:1:3:3,  3:3:3,  1:3:3:3,  3:4:4,  3:3:3:3,  3:3:3:4,  3:3:4:4,  3:4:4:4,  4:4:4:4,  ...,  5:6:6:6:12, 
6:6:6:6:12,  6:6:6:6:13,  ....  (The  degrees  seem  to  be  always  < 6,  but  such  a result 
appears  to  be  quite  difficult  to  prove.) 

15.  If  a people  initially  got  on  the  elevator,  the  togetherness  rating  increases  by  at 
most  a + b at  the  first  stop.  When  it  next  stops  at  the  initial  floor,  the  rating  increases 
by  at  most  b + m — a.  Hence  the  rating  increases  at  most  kb+  (k  — 1 )m  after  k stops. 

16.  Eleven  stops:  123466  to  floor  2,  334466  to  3,  444666  to  4,  256666  to  5,  466666 
to  6,  123445  to  4,  112335  to  5,  222333  to  3,  122225  to  2,  111555  to  5,  111111  to  1. 
[This  is  minimal,  for  a 10-stop  solution  with  any  elevator  capacity  can,  by  symmetry, 
be  arranged  to  stop  on  floors  2,  3,  4,  5,  6,  P2,  P3,  Pa,  ps,  1 in  that  order,  where  P2P3P4P5 
is  a permutation  of  {2, 3, 4, 5};  such  schedules  are  possible  only  when  b > 8.  See  Martin 
Gardner,  Knotted  Doughnuts  (New  York:  Freeman,  1986),  Chapter  10.] 
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IT.  There  are  at  least  (bn)\/b\n  configurations;  and  the  number  that  can  be  ob- 
tained from  a given  one  after  s stops  is  at  most  ((n  — iKT))’"1.  which  is  less 
than  ( n((b  + m)e/b)b)s  by  exercise  1.2.6-67.  Hence  some  configuration  requires 

s(lnn  + 6(l  + ln(l  + m/b)))  > ln(5n)!  — nlnfc!  > bnlnbn  — bn  — n({b  + 1)  In 6 — 6 + l) 


by  exercise  1.2.5-24. 

Notes:  Using  the  fact  that  l/(x  + y)  > | min(l/x,  1/y)  when  x and  y are  positive, 
we  can  express  this  lower  bound  in  the  convenient  form 


n 


nlog(l  + n)  \\ 
log(l  + m/b) ) ) 


Related  results  have  been  obtained  by  A.  Aggarwal  and  J.  S.  Vitter,  CACM  31  (1988), 
1116-1127,  who  also  established  the  matching  upper  bound 


O 


nlog(l  + u) 
log(l  + m/b)  J )' 


See  also  M.  H.  Nodine  and  J.  S.  Vitter,  ACM  Symposium  on  Parallel  Algorithms  and 
Architectures  5 (1993),  120-129,  for  extensions  to  several  disks. 

18.  The  expected  number  of  stops  is  ^2a>1Ps,  where  p3  is  the  probability  that  at  least 
s stops  are  needed.  Let  qs  = 1 — pa+i  T>e  the  probability  that  at  most  s stops  are 
needed.  Then  exercise  17  shows  that  qs  < f(s  — 1 + [s  = 0]),  where  /(s)  = b\nas/(bn)\ 

and  a = n((b  + m)e/b)h . If  /(£  — 1)  < 1 < f(t)  then  J2s>iPs  > Pi  4 hpt=t-(q0  + 

• • -+<?t-i)  > f-(/(0)+/(0)  + - • •+/(*- 2))  > t-(a1-‘+a1-‘+-  • -+a-i)  > t-1  > L-l. 

19.  Consider  doing  step  (vii)  backwards,  distributing  the  records  into  bin  1,  then  bin  2. 
This  operation  is  precisely  what  step  (iv)  is  simulating  on  the  key  file.  [Princeton 
Conference  on  Information  Sciences  and  Systems  6 (1972),  140-144.] 

20.  The  internal  sort  must  be  carefully  chosen,  with  paging  in  mind;  methods  such 
as  shellsort,  address  calculation,  heapsort,  and  list  sorting  can  be  disastrous  if  the 
actual  internal  memory  is  small,  since  they  require  a large  “working  set”  of  pages. 
Quicksort,  radix  exchange,  and  sequentially  allocated  merge  or  radix  sorting  are  much 
better  suited  to  a paging  environment. 

Some  things  the  designer  of  an  external  sort  can  do  that  are  virtually  impossible 
to  include  in  an  automatically  paged  method  are:  (i)  Forecasting  the  input  file  that 
should  be  read  next,  so  that  the  data  is  available  when  it  is  required;  (ii)  choosing  the 
buffer  sizes  and  the  order  of  merge  according  to  hardware  and  data  characteristics. 

On  the  other  hand  a virtual  machine  is  considerably  easier  to  program,  and  it  can 
give  results  that  aren’t  bad,  if  the  programmer  is  careful  and  knows  the  properties  of 
the  underlying  actual  machine.  The  first  substantial  study  of  this  question  was  made 
by  Brawn,  Gustavson,  and  Mankin  [CACM  13  (1970),  483-494.] 

21.  \{L  — j)/D]\  see  CMath,  Eq.  (3.24). 

22.  After  reading  a group  of  D blocks  that  contains  a,,  we  might  need  to  know  a^+D-i 
before  reading  the  next  group  of  D blocks.  And  if  we  store  ctj+D- 1 with  a,  , we  also 
need  the  values  ao,  . . . , ao- 2 in  some  sort  of  file  header  to  get  the  process  started. 

But  with  this  scheme  we  cannot  write  blocks  00  . . . ao-i  until  we  have  computed 
an  ■■■CL2D-2,  so  we  will  need  3 D — 1 output  buffers  instead  of  2D  to  keep  writing 
continuously.  It  is  therefore  better  to  put  the  a’s  in  a separate  (short)  file.  [The  same 
analysis  applies  to  randomized  striping.] 


5.4.9 


ANSWERS  TO  EXERCISES  699 


23.  (a)  Algorithm  5.4. 6F  needs  4 input  buffers,  each  of  superblock  size  DB.  (If  we 
count  output  buffers  as  well,  we  have  a total  of  6 DB  buffer  records  in  memory  with 
Algorithm  5.4.6F  and  5 DB  with  SyncSort.) 

(b)  While  we  are  reading  a group  of  D blocks  we  need  buffer  space  for  the  previous 
D blocks  and  one  unfinished  block,  for  a total  of  (2D  + 1 )B  records.  (Output  requires 
another  2 DB.  But  many  data  processing  operations  that  do  2- way  merging  on  input 
actually  produce  comparatively  little  output.) 

24.  Let  the  1th  block  in  chronological  order  be  block  ji  of  run  kr,  in  particular,  ji  = 0 
and  ki  = l for  1 < l < P.  We  will  read  that  block  at  time  t;  = Ylk=i  tikd,  where 

tikd  = |{r  | 1 < r < l and  kr  = k and  (xk  + jr)  mod  D = d}\ 


is  the  number  of  blocks  of  run  k on  disk  d that  are  chronologically  < l,  and  d = 
(*fc,  + ji)  mod  D.  Let  uik  = |{r  | 1 < r < l and  kr  = fc}|;  then 


tikd  = 


uik  — (d  — Xk)  mod  D 
D 


because  jr  runs  through  the  values  0,  1,  . . . , uik  — 1 when  1 < r < l and  kr  = k.  The 
sequence  ti  for  the  example  of  (19),  (20),  and  (21)  is 


11111  22223  43456  34567  82345  67893 


If  1 > P,  the  number  of  buffer  blocks  we  need  as  we  begin  to  merge  from  the  l th 
block  in  chronological  order  is  7j  + D + P,  where  7j  is  the  number  of  “inversions-with- 
equality”  of  tj,  namely  |{r  | r > l and  tr  < t;}|,  the  number  of  bufferfuls  that  we’ve 
read  but  aren’t  ready  to  use;  D represents  the  buffers  into  which  the  next  input  is 
going,  and  P represents  the  partially  full  buffers  from  which  we  are  currently  merging. 
(With  special  care,  using  links  as  in  SyncSort,  we  could  reduce  the  latter  requirement 
from  P to  P — 1,  but  the  extra  complication  is  probably  not  worthwhile.) 

So  the  problem  boils  down  to  getting  an  upper  bound  on  Ii . We  may  assume  that 
the  input  runs  are  infinitely  long.  Suppose  s of  the  elements  {ii,...,tj}  are  greater 
than  tj ; then  t;  has  tiD  — l + s inversions-with-equality,  because  exactly  tiD  elements 
are  < t;.  It  follows  that  the  maximum  It  occurs  when  s = 0 and  ti  is  a left-to-right 
maximum.  We  have  Y2k=i  uik  = h hence  by  the  formulas  for  ti  above, 

p 

It  < ma x(tiD  — l)  < ^(«ifc  — (d  — Xk)  mod  D + D — 1 — uik) 

k= 1 

P 

= P(D  - 1)  — ^ (d  — Xk)  mod  D 

k= 1 

P 

< P(D  — 1)  — min  (d  — Xk)  mod  D, 
o<d<D  fc=1 

and  there  exist  chronological  orders  for  which  this  upper  bound  is  attained. 

Suppose  rt  of  the  Xk  are  equal  to  t.  We  want  to  choose  the  Xk  so  that  mino<d< D Sd 
is  maximized,  where  sd  = J2k=i(d  ~ Xk)  mod  D = Y.fjo((d  ~ t)  mod  D)rt.  We  can 
assume  that  the  minimum  occurs  at  d = 0.  Then  si  = so  + P — riD,  s2  = si  + P — r2 D, 
. . . , hence  we  have  ri  < [P/ D\ , r\  + r2  < [2P/73J,  . . . ; it  follows  that  the  minimum  is 

D-l 

so  = (P»-l)r1+(Z3-2)r2+---+rD_1  < V [kP/D\  = i((P-l)(D-l)+gcd(P,D)-l), 

k= 1 


j 
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by  exercise  1.2.4-37.  This  bound  is  achieved  when  Xj  = \jD/P]  — 1 for  1 < j < P- 
With  such  Xj  we  can  handle  every  chronological  sequence  at  full  speed  if  we  have 
/max  + D + P = \PD  +§£+§£+§  gcd(P,  D)  — 1 input  buffers  containing  B records 
each.  (This  is  pretty  good  when  D — 2 or  3.) 

25.  Notice  that  at  Time  4,  we  go  back  to  reading  f\  on  disk  0: 


Active  reading 

Time  1 

eobogoaoCo 

Time  2 

fidodidifo 

Time  3 

02/1062  Qi  ds 

Time  4 

/iei  bigiai 

Time  5 

0,2/2  hi  63(^2 

Time  6 

^403/3  &2  e4 

Time  7 

01(13/3  ? e4 

Time  8 

? d5d6  ? ? 

Active  merging  Scratch  Waiting  for 


( 

-) 

do 

cio 

boCo(eogo 

-) 

do 

ao  bo  Co  do 

eo/o9o(did2/i 

-) 

ho 

o-o  bo  Co  do  eo  fo  go  ho 

d\{d2e2d3fig\a,2) 

ei 

(io5o  CodiCi  fogoho 

d2e.2d3a\f\b\g\() 

a2 

(1261  codsezfigiho 

f2e-s(h\g2 

-) 

c^4 

a^bi  cod^es  f2giho 

(hi  62  92  03/304 

-) 

Cl 

02  b\  Cid^esf2gih0 

hi6292d3/3e4( 

?) 

ds 

26.  While  D blocks  are  being  read  and  D are  being  written,  the  merging  procedure 
might  generate  up  to  P + Q — 1 blocks  of  output,  under  the  assumptions  of  (24).  (Not 
P + Q,  since  only  one  merge  buffer  becomes  totally  empty.)  Reading  is  as  fast  as 
writing,  so  D + P + Q — 1 output  buffers  are  necessary  and  sufficient  to  prevent  output 
hangup. 

However,  at  most  D blocks  are  output  for  every  D blocks  of  input,  on  the  average, 
so  about  3 D output  buffers  should  be  adequate  in  practice. 

27.  (a)  En(mi, . . . , mp)  = qt,  where  qt  is  the  probability  that  some  urn  contains 
at  least  t balls.  Clearly  qt  < 1 and 


n — 1 

Qt  < E Pr(urn  k contains  at  least  t balls)  = n Pr(Sn(rai, . . . , mp)  > t). 
k= 0 


(b)  The  probability  generating  function  of  Sn(rrii, . . . ,mp)  is 
P(z)  = n z,fc(l  + (z  ~ 1 Vk/n), 

k=l 

where  qk  = \mk/n\  and  r = mjt  mod  n.  Now  1 + a < (1  + a/n)n  and  1 + ar/n  < 
(1  + a/n)r  when  a > 0;  hence  we  have  Pr(S„(mi, . . . , mp)  > t)  < (1  + a)~tp{  1 + a)  < 
(i  + a )-*  riLiU  + ot/n)mk  = (1  + a)-t(l  + a/n)m. 

If  t < m/n,  we  use  the  “1”  term  in  the  stated  minimum.  If  t > m/n,  the  quantity 
(1  + a)-t(l  + a/n)m  takes  its  minimum  value  (n  — l)m~tmTn/(nmtt(ra  — t)m_t)  when 
a = ( nt  — m)/(m  — t). 

28.  Numerical  evidence  seems  to  support  this  natural  conjecture.  For  example,  we 
have 


£io(l,  1, 1, 1, 1, 1, 1, 1)  = 2.3993180, 
Pro  (2, 1,1, 1,1, 1,1)  = 2.364540, 
£io(2, 2, 1,1, 1,1)  = 2.32076, 
£io(3,l,l,l,l,l)  = 2.29958, 

£io(2,2,2,l,l)  = 2.2628, 

£io(3,2,l,l,l)  = 2.2460, 
£io(4,l,l,l,l)  = 2.2076, 


£io(2,2,2,2)  = 2.178, 
£io(3,2,2,l)  = 2.166, 
£io(3,3,l,l)  = 2.152, 
£io(4,2,l,l)  = 2.138, 
£io(5,l,l,l)  = 2.090, 
£io(3,3,2)  = 2.02, 
£io(4,2,2)  = 2.01, 


£io(4,3,l)  = 2.00, 
£io(5,2,l)  = 1.98, 
£io(6,l,l)  = 1.94, 
£io(4,4)  = 1.7, 
£io(5,3)  = 1.7, 
£io(6,2)  = 1.7, 
£io(7,l)  = 1.7. 
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29.  (a)  At  time  t,  all  disks  are  reading  blocks  that  occur  no  earlier  than  the  block 
marked  at  time  t.  The  next  Q blocks  are  never  removed  from  the  scratch  buffers  once 
they  have  been  read.  Thus  the  relevant  blocks  on  disk  j all  are  read  by  time  < t + Nj; 
they  must  all  participate  in  the  merge  by  time  t + max(iVo, . . . , Nd- i). 

(b)  If  the  (Q  + l)st  block  after  a marked  block  is  not  removed,  the  same  argument 
applies.  Otherwise  the  previous  Q are  not  marked,  and  the  Q + 2 blocks  cannot  all  be 
on  different  disks. 

(c)  Divide  the  chronological  order  of  blocks  into  groups  of  size  Q + 2,  and  consider 
any  particular  group.  If  there  are  blocks  from  run  k,  then  the  numbers  Nj  are 
equivalent  to  the  number  of  balls  in  the  jth  urn,  in  a cyclic  occupancy  problem  with 
n = D and  m = Q + 2.  Thus  the  expected  number  of  marked  cells  in  any  group  is 
at  most  the  upper  bound  in  exercise  27(b).  Calling  that  upper  bound  en(m),  we  may 
take  r(d,m ) = ( d/m)ed(m ). 

[Actually  this  function  r(2,  m)  is  not  monotonic  in  m when  m is  small.  Therefore 
the  entries  listed  for  r( 2, 4)  and  r(2, 12)  in  Table  2 axe  actually  the  values  of  r(2, 3)  and 
r(2, 11);  additional  buffers  cannot  increase  the  number  of  marked  blocks.] 

30.  Let  l — f (s  + s/2s)  In  d] , a — \/2/s.  Then 

ed{sd\nd)  < / + ]Td( l + a/d)sdlnd/(l  + aY 

t>l 

= l + d{l  + a/dYdlnd/a{l  + a)1 
< l + a-1  exp((lnd)(l  + sa  — (s  + s/2s)  ln(l  + a))), 

and  (s  + s/2 s)  ln(l  + a)  > sol  + 1 — a/3.  Therefore 

1 < r (d,  sd  In  d)  = ed(gS1dn1^)  < 1 + yj\' + {1  + \Uslnd  + °^1  (1°S  d)2))  ’ 

if  ,s/(logd)2  — ► oo.  Convergence  to  this  asymptotic  behavior  is  rather  slow  (see  Table  2). 

31.  (When  Q = 0,  we  mark  the  first  block  and  then  repeatedly  mark  the  next 
block  that  shares  a disk  with  one  of  the  blocks  in  the  group  starting  with  the  pre- 
viously marked  block.  For  example,  if  the  chronological  order  of  disk  accesses  is 
112020121210122,  the  marking  would  be  112020121210122.  Therefore  as  P ->  oo, 
we  read  an  average  of  Q(D)n  blocks  during  n units  of  time,  where  Q is  Ramanujan’s 
function,  defined  in  Eq.  1.2.11.3-(2).  By  contrast,  r(d,  2)  — (d  + l)/2  gives  a much 
more  pessimistic  estimate.) 

SECTION  5.5 

1.  It  is  difficult  to  decide  which  sorting  algorithm  is  best  in  a given  situation.  | 

2.  For  small  N,  list  insertion;  for  medium  N,  say  N = 64,  list  merge;  for  large  N, 
radix  list  sort. 

3.  (Solution  by  V.  Pratt.)  Given  two  nondecreasing  runs  a and  /3  to  be  merged, 

determine  in  a straightforward  way  the  subruns  aio:2a3/9i/32/33  such  that  a2  and  /32 
contain  precisely  the  keys  of  a and  /3  having  the  median  value  of  the  entire  file. 
By  successive  “reversals,”  first  forming  aia2/3f af/difo,  then  a3/33,  then 

ai/3ia2/92a3/33,  we  can  reduce  the  problem  to  the  merging  of  subfiles  ai/3i  and  a3/33 
that  are  of  length  < N/2. 
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A considerably  more  complicated  algorithm  due  to  L.  Trabb  Pardo  provides  the 
best  possible  asymptotic  answer  to  this  problem:  We  can  do  stable  merging  in  O(N) 
time  and  stable  sorting  in  0(N  log  N)  time,  using  only  0(log  N)  bits  of  auxiliary 
memory  for  a fixed  number  of  index  variables,  without  transforming  the  records  being 
sorted  in  any  way  [SICOMP  6 (1977),  351-372].  The  same  time  and  space  bounds  have 
been  achieved  with  much  smaller  constant  factors  by  B.-C.  Huang  and  M.  A.  Langston, 
Comp.  J.  35  (1992),  643-650.  See  also  A.  Symvonis,  Comp.  J.  38  (1995),  681-690,  for 
stable  merging  of  M items  with  N when  M is  much  smaller  than  N. 

4.  Only  straight  insertion,  list  insertion,  and  list  merge.  The  variants  of  quicksort 
could  be  made  parsimonious,  but  only  at  the  expense  of  extra  work  in  the  inner  loops 
(see  exercise  5.2.2-24). 

Parsimonious  methods  are  especially  useful  when  the  result  of  a comparison  is  not 
100%  reliable;  see  D.  E.  Knuth,  Lecture  Notes  in  Comp.  Sci.  606  (1992),  61-67. 


SECTION  6.1 

1.  \/(N2  - 1)/12;  see  Eq.  1.2.10-(22). 

2.  SI'.  [Initialize.]  Set  P <-  FIRST. 

S2'.  [Compare.]  If  K = KEY  (P) , the  algorithm  terminates  successfully. 
S3'.  [Advance.]  Set  P «-  LINK(P). 


S4'.  [End  of  file?]  If  P / A,  go  back  to  S2'.  Otherwise  the  algorithm  terminates 
unsuccessfully.  | 


3.  KEY  EQU  3:5 

LINK  EQU  1:2 

START  LDA  K 1 

LD1  FIRST  1 

2H  CMPA  0, l(KEY)  C 


JE  SUCCESS  C 

LD1  0,  l(LINK)  C-S 
J1NZ  2B  C-S 

FAILURE  EQU  * 1 - S | 


The  running  time  is  (6C  — 3S  + 4)u. 

4.  Yes,  if  we  have  a way  to  set  “KEY (A)”  equal  to  K.  [But  the  technique  of  loop 
duplication  used  in  Program  Q'  has  no  effect  in  this  case.] 

5.  No;  Program  Q always  does  at  least  as  many  operations  as  Program  Q'. 

6.  Replace  line  08  by  JE  *+4;  CMPA  KEY+N+2,1;  JNE  3B;  INC1  1;  and  change  lines 
03-04  to  ENT1  — 2— N;  3H  INC1  3. 


7.  Note  that  Cn  — \Cn- i + 1. 

8.  Euler’s  summation  formula  gives 


H(nx)  = C(*)  + 


(1 


. 1 — x B2x  _ 

+ 2n  ~^rn 


B3x(x  + 1)  - 
+ 3! " 


— 0(n 


[Complex  variable  theory  tells  us  that 

C(x)  = 2Xjnx~1  sin(|7rx)r(l  — x)£(l  ~ a), 


a formula  that  is  particularly  useful  when  x < 0.] 

9.  (a)  Yes:  CN  = N - N~9  H(~9\  = N + 1 - N~9H(~e)  = + \ + O(N~0). 

(b)  CN  = 4,  (1  + IV/(1  - (V)))  = I T~e(N  + ^1_7r(l  - 9)  + 1)  + OtIV1-2*). 

(c)  When  9 < 0,  (n)  is  not  a probability  distribution;  (16)  gives  the  estimate 
Cn  = -if#r(l  - 9)N1+e  + Q{N1+ 2e)  + 0(1)  instead  of  (15). 
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10.  Pi  <•■■  < Pn ; (maximum  Cn)  = (IV  + 1)  — (minimum  Cn)-  [Similarly  in  the 

unequal-length  case,  the  maximum  average  search  time  is  Li(l+pi)  -I \-Ln{1+Pn) 

minus  the  minimum  average  search  time.] 

11.  (a)  The  terms  of  fm-i(xn  , . . . ,xt m_1)pi  are  just  the  probabilities  of  the  possible 
sequences  of  requests  that  could  have  preceded,  leaving  Ri  in  position  m.  (b)  The 
second  identity  comes  from  summing  (”)  cases  of  the  first,  on  the  different  ro-subsets 
of  X,  noting  the  number  of  times  each  Pnk  occurs.  The  third  identity  is  a consequence 
of  the  second,  by  inversion.  [Alternatively,  the  principle  of  inclusion  and  exclusion 
could  be  used.]  (c)  5Zm>0  mPnm  = nQnn  - Q„(n- 1>;  hence 


1 + (N-1)-Pi^2 


Pi  + Pj  ’ 


Notes:  W.  J.  Hendricks  [J.  Applied  Probability  9 (1972),  231-233]  found  a simple 
formula  for  the  steady-state  probability  of  each  permutation  of  the  records.  For 
example,  when  N = 4 the  sequence  will  be  R3  Ri  Ri  R2  with  limiting  probability 

P3 pi P4  P2 

P3  + Pi  + P4  + P2  Pi  + P4  + P2  P4  + P2  P2  ' 

In  fact,  this  distribution  had  already  been  obtained  by  M.  L.  Tsetlin  in  his  Ph.D.  thesis 
at  Moscow  University  in  1964,  and  published  in  Chapter  1 of  his  Russian  book  Studies 
in  Automata  Theory  and  Simulation  of  Biological  Systems  (1969). 

James  Bitner  [SICOMP  8 (1979),  82-85]  proved  that,  if  the  list  is  originally  in 
random  order,  the  expected  search  time  after  t random  requests  exceeds  Cn  by  the 
quantity  j J2t,j(Pi  ~ Pj')2(l  — Pi  — PjY/{pi  + Pj)-  Thus,  t searches  require  fewer  than 
tCN  + | J2i,j(Pi  —Pj)2/ ( Pi  +Pj )2  < tCi v + \ (2  ) comparisons  altogether,  on  the  average. 
See  P.  Flajolet,  D.  Gardy,  and  L.  Thimonier,  Discrete  Applied  Math.  39  (1992),  207- 
229,  §6,  for  instructive  proofs  via  generating  functions. 

12.  Cn  = 21-iv  + 2 Y2Y=q  l/(2n  + 1),  which  converges  rapidly  to  2a'  « 2.5290; 
exercise  5.2.4-13  gives  the  value  of  a ' to  40  decimal  places. 

13.  After  evaluating  the  rather  tedious  sum 


Y,k2Hn+k  = 

k—i 


n(n  + l)(2n  + 1) 
6 


(2 H2n  -Hn)~ 


n(n  + l)(10n  — 1) 
36 


we  obtain  the  answer 


CN  = lN-l(2N+l)(H2n-Hn)  + l - §(1V  + l)"1  « ,4091V. 

14.  We  may  assume  that  xi  < x2  < • • • < xn\  then  the  maximum  value  occurs  when 
Vai  < pa 2 < • • • < Pa„,  and  the  minimum  when  yai  > • • • > ya„ , by  an  argument  like 
that  of  Theorem  S. 

15.  Arguing  as  in  Theorem  S,  the  arrangement  RiR2  ■ ■ ■ Rn  is  optimum  if  and  only  if 

Pi/Li(l  - Pi)  > ■ ■ ■ > Pn/Ln(  1 - PN). 
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16.  The  expected  time  Ti  +P1T2  + P1P2T3  + • • • + P1P2  ■ ■ -pjv-iTjv  is  minimized  if  and 
only  if  Ti/(1  — Pi)  < •••  < TW/(1  — Pn).  [BIT  3 (1963),  255-256;  some  interesting 
extensions  have  been  obtained  by  James  R.  Slagle,  JACM  11  (1964),  253-264.] 

17.  Do  the  jobs  in  order  of  increasing  deadlines,  regardless  of  the  respective  times  Tj\ 
[Management  Science  Research  Report  43,  UCLA  (1955).  Of  course  in  practice  some 
jobs  are  more  important  than  others,  and  we  may  want  to  minimize  the  maximum 

weighted  tardiness.  Or  we  may  wish  to  minimize  the  sum  max(TaiH bTai  - 

Dai,  0).  Neither  of  these  problems  appears  to  have  a simple  solution.] 

18.  Let  h=[s  is  present].  Let  A = {j  \ q3  < Tj},  B = [j  \ qj  = r,},  C = {j  \ qj  > tv,}, 
D = {j  \ tj  > 0};  then  the  sum  PiPjd\i-j\  for  the  (q,  r)  arrangement  minus  the 
corresponding  sum  for  the  (q',  r')  arrangement  is  equal  to 

2 ^2  (<?<  — i’i)(qj  — rj)(d\i-j\  — dh+i+2k-i-j)  + 2 (qt—ri)tj(dh+2k-i+j  — di-i+j). 

i€A,jeC  i€C,j€D 

This  is  positive  unless  C = 0orAU.D  = 0.  The  desired  result  now  follows  because 
the  organ-pipe  arrangements  are  the  only  permutations  that  are  not  improved  by  this 
construction  and  its  left-right  dual  when  m = 0,  1. 

[This  result  is  essentially  due  to  G.  H.  Hardy,  J.  E.  Littlewood,  and  G.  Polya,  Proc. 
London  Math.  Soc.  (2),  25  (1926),  265-282,  who  showed,  in  fact,  that  the  minimum 
°f  Pi<ljd\i-j\  is  achieved,  under  all  independent  arrangements  of  the  p’s  and  q' s, 
when  both  p’s  and  q’s  are  in  a consistent  organ-pipe  order.  For  further  commentary 
and  generalizations,  see  their  book  Inequalities  (Cambridge  University  Press,  1934), 
Chapter  10.] 

19.  All  arrangements  are  equally  good.  Assuming  that  d(j,  j)  = 0,  we  have 

X^PiPid(bj)  = + i))[i^j)  = \{1  - pi p%)c. 

ij  iyj 

[The  special  case  d(i,j)  = 1 + (j  - i)  mod  N for  i ± j is  due  to  K.  E.  Iverson, 
A Programming  Language  (New  York:  Wiley,  1962),  138.  R.  L.  Baber,  JACM  10 
(1963),  478—486,  has  studied  some  other  problems  associated  with  tape  searching  when 
a tape  can  read  forward,  rewind,  or  backspace  k blocks  without  reading.  W.  D.  Frazer 
observes  that  it  is  possible  to  make  significant  reductions  in  the  search  time  if  we  are 
allowed  to  replicate  some  of  the  information  in  the  file;  see  E.  B.  Eichelberger,  W.  C. 
Rodgers,  and  E.  W.  Stacy,  IBM  J.  Research  & Development  12  (1968),  130-139,  for 
an  empirical  solution  to  a similar  problem.] 

20.  Going  from  (q,r)  to  (q.  r')  as  in  exercise  18,  with  m = 0 or  rn  = h = l.  gives  a 
net  change  of 

~ - min(dh+l+2k-i-j  , di+j -1)) , 

i£A,jeC 

which  is  positive  unless  A or  C is  0.  By  circular  symmetry  it  follows  that  the  only 
optimal  arrangements  are  cyclic  shifts  of  the  organ-pipe  configurations.  [For  a different 
class  of  problems  with  the  same  answer,  see  T.  S.  Motzkin  and  E.  G.  Straus,  Proc. 
Amer.  Math.  Soc.  7 (1956),  1014-1021.] 

21.  This  problem  was  essentially  first  solved  by  L.  H.  Harper,  SIAM  J.  Appl.  Math. 
12  (1964),  131-135.  For  generalizations  and  references  to  other  work,  see  J.  Applied 
Probability  4 (1967),  397-401. 
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22.  A priority  queue  of  size  1000  (represented  as,  say,  a heap,  see  Section  5.2.3). 
Insert  the  first  1000  records  into  this  queue,  with  the  element  of  greatest  d(Kj,K ) at 
the  front.  For  each  subsequent  Kj  with  d(Kj,K)  < d( front  of  queue,  1C),  replace  the 
front  element  by  Rj  and  readjust  the  queue. 

SECTION  6.2.1 

1.  Prove  inductively  that  Ki-i  < K < Ku+ i whenever  we  reach  step  B2;  and  that 
l < i < u whenever  we  reach  B3. 

2.  (a,  c)  No;  it  loops  if  l = u — 1 and  K > Ku.  (b)  Yes,  it  does  work.  But  when  K is 
absent,  there  will  often  be  a loop  with  l = u and  K < Ku. 

3.  This  is  Algorithm  6. IT  with  N = 3.  In  a successful  search,  that  algorithm  makes 
(N  + l)/2  comparisons,  on  the  average;  in  an  unsuccessful  search  it  makes  N/2  + 1 — 
1/(1V  + 1). 

4.  It  must  be  an  unsuccessful  search  with  N = 127;  hence  by  Theorem  B the  answer 
is  138u. 

5.  Program  6.1Q'  has  an  average  running  time  of  1.75/V  + 8.5  — (N  mod  2)/4 N;  this 
beats  Program  B if  and  only  if  IV  < 44.  [It  beats  Program  C only  for  IV  < 11.] 

7.  (a)  Certainly  not.  (b)  The  parenthesized  remarks  in  Algorithm  U will  hold  true, 
so  it  will  work,  but  only  if  Ko  = — oo  and  fCjv+i  = +oo  are  both  present  when  N is 
odd. 

8.  (a)  N.  It  is  interesting  to  prove  this  by  induction,  observing  that  exactly  one  of  the 
S’ s increases  if  we  replace  N by  IV  + 1.  [See  AMM  77  (1970),  884  for  a generalization.] 
(b)  Maximum  = JY  5j  — N;  minimum  = 2<5i  — 8j  = N mod  2. 

9.  If  and  only  if  TV  = 2fc  — 1. 

10.  Use  a “macro-expanded”  program  with  the  DELTA’S  included;  thus,  for  N = 10: 


START 

ENT1 

5 

LDA 

K 

CHPA 

KEY,  1 

JL 

C3A 

C4A 

JE 

SUCCESS 

C3A 

EQU 

* 

INC1 

3 

DEC1 

3 

CMPA 

KEY,  1 

CMPA 

KEY.l 

JL 

C3B 

JGE 

C4B 

C4B 

JE 

SUCCESS 

C3B 

EQU 

♦ 

INC1 

1 

DEC1 

1 

CMPA 

KEY.l 

CMPA 

KEY,  1 

JL 

C3C 

JGE 

C4C 

C4C 

JE 

SUCCESS 

C3C 

EQU 

* 

INC1 

1 

DEC1 

1 

CMPA 

KEY,  1 

CMPA 

KEY.l 

JE 

SUCCESS 

JE 

SUCCESS 

JMP 

FAILURE 

JMP 

FAILURE  | 

[Exercise  23  shows  that  most  of  the  “JE”  instructions  may  be  eliminated,  yielding  a 
program  about  61g  N lines  long  that  takes  only  about  4 lg  Y units  of  time;  but  that 
program  will  be  faster  only  for  Y > 1000  (approximately).] 
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11.  Consider  the  corresponding  tree,  such  as  Fig.  6:  When  77  is  odd,  the  left  subtree 
of  the  root  is  a mirror  image  of  the  right  subtree,  so  K < Ki  occurs  just  as  often  as 
K > Ki ; on  the  average  Cl  = \{C  + S)  and  C2  = §(C  - S),  A=  |(1  - S).  When  N 
is  even,  the  tree  is  the  same  as  the  tree  for  N + 1 with  all  labels  decreased  by  1,  except 
that  (o)  becomes  redundant;  on  the  average,  letting  k = [lg  ArJ , we  have 

C+l  k C-  1 k 

C1  = ~l--2 AT’  C2=— + 2 N’  A = 0'  US  = 1’ 

C1_(k  + 1)N  (k  + l)(N  + 2)  N 

2(77+1)’  2(N  + 1)  ’ 2(N  + 1)  ’ lf5-°' 


2(77  + 1)’ 


if  5 = 1; 


if  S = 0. 


(The  average  value  of  C is  stated  in  the  text.) 


[OJ 

UJ 

LU 

[3j 

a 

a 

0 

0 

m 

m 

0 [n 

] m 

13.  N = 1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

ii 

12 

13 

14 

15 

16 

Cjv  = 1 

1| 

1* 

3 

H 

2? 

Z6 

2? 

3- 

3 

3 

3 

3 — 
6 12 

ii. 

J13 

3 — 

13  14 

3 A 
* 15 

4-L 

16 

C'N  = 1 

!§ 

2 

2| 

2! 

3 

3 

3- 

3 — 
^ 10 

Q JL 
25  li 

3 — 
^12 

4 

4 

4 

4 

4~ 

^17 

14.  One  idea  is  to  find  the  least  M > 0 such  that  N + M has  the  form  Fk+1  - 1,  then 
to  start  with  i «—  Fk  — M in  step  FI,  and  to  insert  “If  i < 0,  go  to  F4”  at  the  beginning 
of  F2.  A better  solution  would  be  to  adapt  Shar’s  idea  to  the  Fibonaccian  case:  If  the 
result  of  the  very  first  comparison  is  K > KFk , set  i <-  i - M and  go  to  F4  (proceeding 
normally  from  then  on).  This  avoids  extra  time  in  the  inner  loop. 

15.  The  external  nodes  appear  on  levels  [k/2\  through  k - 1;  the  difference  between 
these  levels  is  greater  than  unity  except  when  fc  = 0,  1,  2,  3,  4. 

16.  The  Fibonacci  tree  of  order  k,  with  left  and  right  reversed,  is  the  binary  tree  corre- 
sponding to  the  lineal  chart  up  to  the  fcth  month,  under  the  “natural  correspondence” 
of  Section  2.3.2,  if  we  remove  the  topmost  node  of  the  lineal  chart. 

17.  Let  the  path  length  be  k-A(n );  then  A(Fj)  = j and  A(Fj+m)  = 1 + A(m)  when 

0 < m < Fj-i. 


18.  Successful  search:  Ak  = 0,  Ck  = (3kFk+1  + (k  - 4)Fk)/5(Fk+1  - 1)  - 1,  Clfc  = 
Ck-i(Fk  - l)/(Ffc+i  - 1).  Unsuccessful  search:  A'k  = Fk/Fk+ 1,  C'k  = (3kFk+1  + 
( k — 4)Ffc)/5Fc+i,  Cl'fc  = Cfc.jFfc/Ffc+i  + Fk-i/Fk+1.  C2  — C — Cl.  (See  exercise 
1.2.8-12  for  the  solution  to  related  recurrences.) 

20.  (a)  b — p~pq~q.  (b)  There  are  at  least  two  errors.  The  first  blunder  is  that 
division  is  not  a linear  function,  so  it  can’t  be  simply  “averaged  over.”  Actually  with 
probability  p we  get  pN  elements  remaining,  and  with  probability  q we  get  qN,  so  we 
can  expect  to  get  (p2  + q2)N:  thus  the  average  reduction  factor  is  really  l/(p2  + q2). 
Now  the  reduction  factor  after  k iterations  is  1 /(p2  + q2)k,  but  we  cannot  conclude 
that  b = 1 /(p2  + q2)  since  the  number  of  iterations  needed  to  locate  some  of  the  items 
is  much  more  than  to  locate  others.  This  is  a second  fallacy.  [It  is  very  easy  to  make 
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plausible  but  fallacious  probability  arguments,  and  we  must  always  be  on  our  guard 
against  such  pitfalls!] 

21.  It’s  impossible,  since  the  method  depends  on  the  key  values. 

22.  FOCS  17  (1976),  173-177.  See  also  Y.  Perl,  A.  Itai,  and  H.  Avni,  CACM  21 
(1978),  550-554;  G.  H.  Gonnet,  L.  D.  Rogers,  and  J.  A.  George,  Acta  Informatica 
13  (1980),  39-52;  G.  Louchard,  RAIRO  Inform.  Theor.  17  (1983),  365-385;  Comput- 
ing 46  (1991),  193-222.  The  variance  is  0(log  log  iV).  Extensive  empirical  tests  by 
G.  Marsaglia  and  B.  Narasimhan,  Computers  and  Math.  26,8  (1993),  31-42,  show 
that  the  average  number  of  table  accesses  is  very  close  to  lglglV,  plus  about  0.7  if  the 
search  is  unsuccessful.  When  N = 220,  for  example,  a random  successful  search  in 
a random  table  takes  about  4.29  accesses,  while  a random  unsuccessful  search  takes 
about  5.05. 

23.  Go  to  the  right  on  > , to  the  left  on  < ; when  reaching  node  [T]  it  follows  from  (l) 
that  Ki  < K < Ki+i,  so  a final  test  for  equality  will  distinguish  between  success  or 
failure.  (The  key  Kq  = — oo  should  always  be  present.) 

Algorithm  C would  be  changed  to  go  to  C4  if  K = Ki  in  step  C2.  In  C3  if 
DELTA  [j]  = 0,  set  i <—  i — 1 and  go  to  C5.  In  C4  if  DELTA  [J]  = 0,  go  directly  to  C5. 
Add  a new  step  C5:  “If  K = Ki,  the  algorithm  terminates  successfully,  otherwise  it 
terminates  unsuccessfully.”  [This  would  not  speed  up  Program  C unless  N > 226;  the 
average  successful  search  time  changes  from  (8.5  lg  N — 6 )u  to  (81g  N + 7)u.] 

24.  The  keys  can  be  arranged  so  that  we  first  set  i <—  1,  then  i t—  2i  or  2i  + 1 
according  as  K < Ki  or  K > Kp,  the  search  is  unsuccessful  when  i > N.  For  example 
when  N = 12  the  necessary  key  arrangement  is 

Ka  < KA  < K9  < K2  < K10  < K5  < ifu  < K\  < K12  < K6  < K3  < K7. 

When  programmed  for  MIX  this  method  will  take  only  about  6 lg  TV  units  of  time,  so  it 
is  faster  than  Program  C.  The  only  disadvantage  is  that  it  is  a little  tricky  to  set  up 
the  table  in  the  first  place. 

25.  (a)  Since  do  = 1 — bo,  ai  = 2a0  — fox , «2  = 2«i  — b2,  etc.,  we  have  A(z)  + B(z)  = 
1 + 2zA(z).  Several  of  the  formulas  derived  in  Section  2. 3. 4. 5 follow  immediately 
from  this  relation  by  considering  A(l),  71(1),  7l( |),  A'(l),  and  B'(  1).  If  we  use  two 
variables  to  distinguish  left  and  right  steps  of  a path  we  obtain  the  more  general  result 
A(x,y ) + B(x,y)  = 1 + (x  + y)A(x,  y),  a special  case  of  a formula  that  holds  in  t- ary 
trees  [see  R.  M.  Karp,  IRE  Transactions  IT-7  (1961),  27-38]. 

(b)  var  ( g ) = ((N  + 1) /TV)  var  ( h ) — (( N + 1)/AT2)  mean(h)2  + 2. 

26.  The  merge  tree  for  the  three-tape  polyphase  merge  with  a perfect  level  k distri- 
bution is  the  Fibonacci  tree  of  order  k + 1 if  we  permute  left  and  right  appropriately. 
(Redraw  the  polyphase  tree  of  Fig.  76  in  Section  5.4.4,  with  the  left  and  right  subtrees 
of  A and  C reversed,  obtaining  Fig.  8.) 

27.  At  most  k + 1 of  the  2k  outcomes  will  ever  occur,  since  we  may  order  the  indices 
in  such  a way  that  Kt!  < K,2  < • • • < K,k . Thus  the  search  can  be  described  by 
a tree  with  at  most  ( k + l)-way  branching  at  each  node.  The  number  of  items  that 
can  be  found  on  the  mth  step  is  at  most  k(k  + l)m_1;  hence  the  average  number  of 
comparisons  is  at  least  N~ 1 times  the  sum  of  the  smallest  N elements  of  the  multiset 
{fc-1,  k(k  + 1) -2,  k(k  + l)2 -3,  . . .}.  When  N > (k  + l)n  — 1,  the  average  number  of 
comparisons  is  > (( k + l)n  — l)-1  Ylm= l + l)m_1nr  > n — 1 /k. 
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28.  [Skrifter  udgivne  af  Videnskabs-Selskabet  i Christiania,  Mathematisk-Naturviden- 
skabelig  Klasse  (1910),  No.  8;  reprinted  in  Thue’s  Selected  Mathematical  Papers  (Oslo: 
Uni versitet sforlaget , 1977),  273-310.]  (a)  T„  has  F„+i+.Fn_i  = F2n/Fn  leaves.  (This  is 
the  so-called  Lucas  number  Ln  = <f>n  + <j)n.)  (b)  The  axiom  says  that  T0(T2(x))  = Ti(x), 
and  we  obviously  have  Tm(Tn(x))  = Tm+„_ i(x)  when  m = 1 or  n = 1.  By  induction 
on  n,  the  result  holds  when  m = 0;  for  example,  T0(T3(x))  = T0(T2(x)  * Ti(x))  = 
To(T1(T2(x))*To(T2(x)))  = To(T2(T2(x)))  = T2(x).  Finally  we  can  use  induction  on  m. 

29.  Assume  that  Ko  = — oo  and  Kn+i  = Kn+2  = oo.  First  do  a binary  search  on 

K2  < <■■■',  this  takes  at  most  [lg  N J comparisons.  If  unsuccessful,  it  determines 

an  interval  with  K2j~2  < K < K2];  and  K is  not  present  if  2 j = N + 2.  Otherwise,  a 
binary  search  for  K2j- x will  determine  i such  that  K2i-2  < K2j-i  < K2i.  Then  either 
K = K2i~\  or  K is  not  present.  [See  Theor.  Comp.  Sci.  58  (1988),  67.] 

30.  Let  n = [TV/ 4J . Starting  with  Kx  < K2  < ■ ■ ■ < Kn,  we  can  put  Kx,  K3, 
...,  K2n- 1 into  any  desired  order  by  swapping  them  with  a permutation  of  K2n+ 1, 
K2n+3,  . . . , Kin-i',  this  arrangement  satisfies  the  conditions  of  the  previous  exercise. 
Now  we  let  Kx  < K3  < • • • < K2t+i_3  represent  the  boundaries  between  all  possible 
t- bit  numbers,  and  we  insert  K2t+i_x,  K2t+ i+1,  . . . , K2t+ i+2m_3  between  these  “fence- 
posts”  according  to  the  values  of  xx,  x2,  ...,  xm.  For  example,  if  m = 4,  t = 3, 
xx  = (001)2,  x2  = (111)2,  and  x3  = X4  = (100)2,  the  desired  order  is 

Kx  < Kx3  < K3  < K$  < Kr  < KX9  < K2X  < Kg  < Kxx  < Kx3  < Kn. 

(We  could  also  let  K2X  precede  Kx 9.)  A binary  search  for  K2t+ i+2j_3  in  the  subarray 
Kx  < K3  < ■ ■ ■ < K2t+ 1_3  will  now  find  the  bits  of  Xj  from  left  to  right.  [See  Fiat, 
Munro,  Naor,  Schaffer,  Schmidt,  and  Siegel,  J.  Comp.  Syst.  Sci.  43  (1991),  406-424.] 

SECTION  6.2.2 

1.  Use  a header  node,  with  say  ROOT  = RLINK(HEAD) ; start  the  algorithm  at  step  T4 
with  P HEAD.  Step  T5  should  act  as  if  K > KEY  (HEAD) . [Thus,  change  lines  04  and  05 
of  Program  T to  “ENT1  ROOT;  CMPA  K”.] 

2.  In  step  T5,  set  RTAG(Q)  <—  1.  Also,  when  inserting  to  the  left,  set  RLINK(Q)  P; 
when  inserting  to  the  right,  set  RLINK(Q)  <-  RLINK(P)  and  RTAG(P)  <-  0.  In  step 
T4,  change  the  test  “RLINK(P)  / A”  to  “RTAG(P)  = 0”.  [If  nodes  are  inserted  into 
successively  increasing  locations  Q,  and  if  all  deletions  are  last-in-first-out,  the  RTAG 
fields  can  be  eliminated  since  RTAG(P)  will  be  1 if  and  only  if  RLINK(P)  < P.  Similar 
remarks  apply  with  simultaneous  left  and  right  threading.] 

3.  We  could  replace  A by  a valid  address,  and  set  KEY(A)  •<—  K at  the  beginning  of 
the  algorithm;  then  the  tests  for  LLINK  or  RLINK  = A could  be  removed  from  the  inner 
loop.  However,  in  order  to  do  a proper  insertion  we  need  to  introduce  another  pointer 
variable  that  follows  P;  this  can  be  done  without  losing  the  stated  speed  advantage,  by 
duplicating  the  code  as  in  Program  6. 2. IF.  Thus  the  MIX  time  would  be  reduced  to 
about  5.5C  units. 

4.  CN  = l+(0-l+l-2+-  • •+(n-l)2n-1+C^_1+.  • .+C'n_x)/N  = (1+1/N)C'n-1,  for 

N > 2n  — 1.  The  solution  to  these  equations  is  C"N  = 2(Hn+x  — H2n)  + n for  N > 2”  — 1, 
a savings  of  2 H2»  — n — 2 « n(ln4  — 1)  comparisons.  The  actual  improvement  for 
n = 1,  2,  3,  4 is,  respectively  0,  |,  thus  comparatively  little  is  gained 

for  small  fixed  n.  [See  Frazer  and  McKellar,  JACM  17  (1970),  502,  for  a more  detailed 
derivation  related  to  an  equivalent  sorting  problem.] 
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5.  (a)  The  first  element  must  be  CAPRICORN;  then  we  multiply  the  number  of  ways 
to  produce  the  left  subtree  by  the  number  of  ways  to  produce  the  right  subtree,  times 
(3°),  the  number  of  ways  to  shuffle  those  two  sequences  together.  Thus  the  answer 
comes  to 

(?)(X)Q(X)(X)(0(X)-«* 

[In  general,  the  answer  is  the  product,  over  all  nodes,  of  (©) , where  l and  r stand 
for  the  sizes  of  the  left  and  right  subtrees  of  the  node.  This  is  equal  to  IV!  divided  by 
the  product  of  the  subtree  sizes.  It  is  the  same  formula  as  in  exercise  5.1.4-20;  indeed, 
there  is  an  obvious  one-to-one  correspondence  between  the  permutations  that  yield  a 
particular  search  tree  and  the  “topological”  permutations  counted  in  that  exercise,  if  we 
replace  at  in  the  search  tree  by  k (using  the  notation  of  exercise  6).]  (b)  2JV~1  = 1024; 
at  each  step  but  the  last,  insert  either  the  smallest  or  largest  remaining  key. 

6.  (a)  For  each  of  the  Pnk  permutations  ai . . . a„-ia„  whose  cost  is  k,  construct  n + 1 
permutations  ai . . . a'n_1m  a'n,  where  a'j  = cij  or  a3  + 1,  according  as  a3  < m or  a,j  > m. 
[See  Section  1.2.5,  Method  2.]  If  m = an  or  an  + 1,  this  permutation  has  a cost  of  k + 1, 
otherwise  it  has  a cost  of  k.  (b)  G„(z)  = (2 z + n — 2)  (2 z + n — 3) . . . (2 z).  Hence 


This  generating  function  was,  in  essence,  obtained  by  W.  C.  Lynch,  Comp.  J.  7 (1965), 
299-302.  (c)  The  generating  function  for  probabilities  is  gn(z)  = Gn(z)/n\.  This  is  a 
product  of  simple  probability  generating  functions,  so  the  variance  of  C'rl _ 1 is 


var(^)  = Evar(f^)=E( 

fc=n  ' / ic—n  \ 


k + 2 (k  + 2)2 


= 2 Hn  - 4 H™  + 2. 


[By  exercise  6.2.1-25(b)  we  can  use  the  mean  and  variance  of  C'n  to  compute  the 
variance  of  C„,  which  is  (2  + 10 /n)Hn  — 4(1  + 1 /n)(Hn^  + H^/ri)  + 4;  this  formula  is 
due  to  G.  D.  Knott.] 

7.  A comparison  with  the  fcth  largest  element  will  be  made  if  and  only  if  that  element 
occurs  before  the  mth  and  before  all  those  between  the  fcth  and  mth;  this  happens  with 
probability  l/(|m— &|  + l).  Summing  over  k gives  the  answer  Hm-\-Hn+i-m  — 1.  [CACM 
12  (1969),  77-80;  see  also  L.  Guibas,  Acta  Informatica  4 (1975),  293-298.] 

8.  (a)  gn(z)  = zn_1  ££=1  9k-i(z)g„-k(z)/n,  g0(z)  = 1. 

(b)  7 n2  - 4 (n  + 1 )2M2)  - 2 (n  + 1 )H„  + 13n.  [P.  F.  Windley,  Comp.  J.  3 (1960), 
86,  gave  recurrence  relations  from  which  this  variance  could  be  computed  numerically, 
but  he  did  not  obtain  the  solution.  Notice  that  this  result  is  not  simply  related  to  the 
variance  of  C„  stated  in  the  answer  to  exercise  6.] 

10.  For  example,  each  word  x of  the  key  could  be  replaced  by  ax  mod  m,  where  m is 
the  computer  word  size  and  a is  a random  multiplier  relatively  prime  to  m.  A value 
near  to  ( 4> — 1 )m  can  be  recommended  (see  Section  6.4).  The  flexible  storage  allocation 
of  a tree  method  may  make  it  more  attractive  than  a hash  coding  scheme. 

11.  N — 2;  but  this  occurs  with  probability  1 /(N N\),  only  in  the  deletion 

© N N-l  ...  2. 

12.  i(n  + l)(n  + 2)  of  the  deletions  in  the  proof  of  Theorem  H belong  to  Case  1,  so 
the  answer  is  (N  + 1)/27V. 
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13.  Yes.  In  fact,  the  proof  of  Theorem  H shows  that  if  we  delete  the  fcth  element 
inserted,  for  any  fixed  k,  the  result  is  random.  (G.  D.  Knott  [Ph.D.  thesis,  Stanford, 
1975]  showed  that  the  result  is  random  after  an  arbitrary  sequence  of  random  insertions 
followed  by  successive  deletion  of  the  (ki, . . . ,kd)th  elements  inserted,  for  any  fixed 
sequence  ki,. . . ,kd.) 

14.  Let  NODE(T)  be  on  level  k,  and  let  LLINK(T)  = A,  RLINK(T)  = Ri,  LLINK(R!)  = R2, 
. . . , LLINK(R,/)  = A,  where  R,j  ^ A and  d > 1.  Let  N0DE(R,)  have  n,  internal  nodes  in 
its  right  subtree,  for  1 < i < d.  With  step  D1.5  the  internal  path  length  decreases  by 
k + d + ni  + ■ ■ ■ + nd',  without  that  step  it  decreases  by  k + d + rid- 

15.  11,  13,  25,  11,  12.  [If  a,j  is  the  (smallest,  middle,  largest)  of  {«i, a2,  <13},  the  tree 
"\_  is  obtained  (4,  2,3)  x 4 times  after  the  deletion.] 

16.  Yes;  even  the  deletion  operation  on  permutations,  as  defined  in  the  proof  of 
Theorem  H,  is  commutative  (if  we  omit  the  renumbering  aspect).  If  there  is  an  element 
between  X and  Y,  deletion  is  obviously  commutative  since  the  operation  is  affected  only 
by  the  relative  positions  of  X,  Y,  and  their  successors  and  there  is  no  interaction  between 
the  deletion  of  X and  the  deletion  of  Y.  On  the  other  hand,  if  Y is  the  successor  of  X, 
and  Y is  the  largest  element,  both  orders  of  deletion  have  the  effect  of  simply  removing 
X and  Y.  If  Y is  the  successor  of  X and  Z the  successor  of  Y,  both  orders  of  deletion 
have  the  effect  of  replacing  the  first  occurrence  of  X,  Y,  or  Z by  Z and  deleting  the 
second  and  third  occurrences  of  these  elements  within  the  permutation. 

18.  Use  exercise  1.2.7-14. 

19.  2i7jv  — 1— 2 52fcLi(k— \)e/kNe  = 2Hn  — 1 — 2/9+0(N~e).  [The  Pareto  distribution 
6.1-(i3)  also  gives  the  same  asymptotic  result,  to  within  0(n~e  logn).] 

20.  Yes  indeed.  Assume  that  Ki  <•••  < Kn,  so  that  the  tree  built  by  Algorithm  T is 
degenerate;  if,  say,  Pk  = ( 1 + ((N  + l)/2  — k)e)/N,  the  average  number  of  comparisons 
is  (N  + l)/2  — (N2  — l)e/12,  while  the  optimum  tree  requires  fewer  than  [IgA'] 
comparisons. 

21.  1,  1.  (Most  of  the  angles  are  30°,  60°,  or  90°. ) 

22.  This  is  obvious  when  d = 2,  and  for  d > 2 we  had  r[i,j  — 1]  < r[i+l,  j — 1]  < 

r[i+l,j]- 


[Increasing  the  weight  of  the  first  node  will  eventually  make  it  move  to  the  root  position; 
this  suggests  that  dynamically  maintaining  a perfectly  optimum  tree  is  hard.] 

24.  Let  c be  the  cost  of  a tree  obtained  by  deleting  the  nth  node  of  an  optimum  tree. 
Then  c(0,  n — 1)  < c < c(0,n)  — qn- 1,  since  the  deletion  operation  always  moves  |n— 1 [ 
up  one  level.  Also  c(0,  n)  < c(0,  n— 1)  + qn- 1,  since  the  stated  replacement  yields  a 
tree  of  the  latter  cost.  It  follows  that  c(0,  n — 1)  = c = c(0,  n)  — qn-i- 
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25.  (a)  Assume  that  A < B and  B < C,  and  let  a G A,  b £ B,  c G C,  c < a.  If  c < b 
then  c€  B;  hence  c € A and  a E B;  hence  a G C.  If  c > b,  then  a G B\  hence  a 6 C 
and  c 6 B;  hence  c£  A (b)  Not  hard  to  prove. 

26.  The  cost  of  every  tree  has  the  form  y + lx  for  some  real  y > 0 and  integer  l > 0. 
The  minimum  of  a finite  number  of  such  functions  (taken  over  all  trees)  always  has  the 
form  described. 

27.  (a)  The  answer  to  exercise  24  (especially  the  fact  that  c = c(0,  n— 1))  implies  that 
R( 0,  n— 1)  = R( 0,  n)  \ {n}. 

(b)  If  l = l' , the  result  in  the  hint  is  trivial.  Otherwise  let  the  paths  to  [n]  be 


Since  r = ro  > so  = s and  tv  < sg  = n,  we  can  find  a level  k > 0 such  that  rk  > sk 
and  rk+i  < s*,+i.  We  have  rk+i  € R(rk,n),  sk+1  G R(sk,n),  and  R(sk,n)  < R(rk,n) 
by  induction,  hence  rk+ i € R(sk,n)  and  s/t+i  G R(rk,n);  the  result  in  the  hint  follows. 

Now  to  prove  that  R!h  < Rh,  let  r G R'h,  s G Rh,  s < r,  and  consider  the  optimum 
trees  shown  when  x = we  must  have  l > lh  and  we  may  assume  that  /'  = //,.  To 
prove  that  Rh  < R'h+i,  let  r G Rh,  s G R'h+ 1,  s < r,  and  consider  the  optimum  trees 
shown  when  x = Xh+i]  we  must  have  /'  < lh  and  we  may  assume  that  l = lh- 

29.  It  is  a degenerate  tree  (see  exercise  5)  with  YOU  at  the  top,  THE  at  the  bottom, 
needing  19.158  comparisons  on  the  average. 

Douglas  A.  Hamilton  has  proved  that  some  degenerate  tree  is  always  worst.  There- 
fore an  0(n2)  algorithm  exists  to  find  pessimal  binary  search  trees. 

30.  See  R.  L.  Wessner,  Information  Processing  Letters  4 (1976),  90-94;  F.  F.  Yao, 
SIAM  J.  Algebraic  and  Discrete  Methods  3 (1982),  532-540. 

31.  See  Acta  Informatica  1 (1972),  307-310. 

32.  When  M is  large  enough,  the  optimum  tree  must  have  the  stated  form  and  the 
minimum  cost  must  be  M times  the  minimum  external  path  length  plus  the  solution 
to  the  stated  problem. 

[JVofes:  The  paper  by  Wessner  cited  in  answer  30  explains  how  to  find  optimum 
binary  search  trees  of  height  < L.  In  the  special  case  pi  = • • • = pn  = 0,  the  stated 
result  is  due  to  T.  C.  Hu  and  K.  C.  Tan,  MRC  Report  1111  (Univ.  of  Wisconsin,  1970). 
A.  M.  Garsia  and  M.  L.  Wachs  proved  that  in  this  case  all  external  nodes  will  appear  on 
at  most  two  levels  if  min£=1  (qk~i  + qk)  > max£_0  qk,  and  they  presented  an  algorithm 
that  needs  only  0(n)  steps  to  find  an  optimum  two-level  tree.] 

33.  For  the  stated  problem,  see  A.  Itai,  SICOMP  5 (1976),  9-18.  For  the  alternatives, 
see  D.  Spuler,  Acta  Informatica  31  (1994),  729-740. 

34.  It  equals  2W<P1 p">iv(27rlV)(1-n)/2(p1 . . .pn)-1/2(l  + 0(1/N)),  ifpi...p„  / 0, 

by  Stirling’s  approximation. 

35.  The  minimum  value  of  the  right-hand  side  occurs  when  2x  = (1  — p)/p,  and  it 
equals  1 — p + H(p,  1 — p).  But  H(p,q,r)  < 1 — p + H (p,  1 — p),  by  (20)  with  k = 2. 

36.  First  we  prove  the  hint,  which  is  due  to  Jensen  [Acta  Math.  30  (1906),  175-193]. 
If  / is  concave,  the  function  g(p)  = f(px  + (1  — p)y)  — pf(x)  — (1  — p)f(y)  is  concave 
and  satisfies  g( 0)  = g(l)  = 0.  If  g(p)  < 0 and  0 < p < 1 there  must  be  a value  po  < p 
with  g' (po)  < 0 and  a value  p\  > p with  g'(pi)  > 0,  by  the  mean  value  theorem; 
but  this  contradicts  concavity.  Therefore  f(px  + (1  — p)y)  > pf(x)  + (1  — p)f(y)  for 
0 < p < 1,  a fact  that  is  also  geometrically  obvious.  Now  we  can  prove  by  induction 
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that  f(piXi  + • • • + p„x„)  > pif{xi)  + • • • + pnf(x„),  since  /(pixi  + • • • + p„xn)  > 

Plf(xi)  H \~Pn-2f(Xn-2)  + (Pn-1  + Pn)/((Pn-l*n-l  + PnXn)  / {Pn-1  + Pn))  if  n > 2. 

By  Lemma  E we  have 


H(XY)  = H(X)  + J2PiH(ni/pi, nn/pi); 

i=  1 

and  the  latter  sum  is  E”=i  El’Ll  Pif{rij/Pi)  < E"=i  /(ELLi  ra)  = H(Y),  where 
fix)  = x lg(l/a:)  is  concave. 

37.  By  part  (a)  of  exercise  3.3.2-26,  we  have  Pr(Pi  > s)  = (1  - s)n_1.  Therefore 
EH(Pi,...,Pn)  = nEPi  lg(l/Pi)  = nfjil  - s)n~1d(slg(l/s))  = -(A  + B)/ln2, 
where  A = n /*(  1 — s)n~1ds  = 1 and 


-Hn 


by  exercise  1.2.7-13.  Thus  the  answer  is  ( Hn  — l)/ln2.  (This  is  lgn  + (7  - l)/ln2  + 
0(n_1),  very  near  the  maximum  entropy  P(±, . . . , T)  = lgn.  Therefore  H(pi,. . . ,pn) 
is  O(logn)  with  high  probability.) 

38.  If  = Sk  we  have  qk-i  = Pk  = qk  = 0;  see  (26).  Construct  a tree  for  the 
n — 1 probabilities  (pi,  • . • ,Pk-i,Pk+i,  ■ • ■ ,Pn',qo,  ■ ■ ■ , qk-i,qk+i,  ■ ■ • ,9n),  and  replace 
leaf  | fc— 1 1 by  a 2-leaf  subtree. 

39.  We  can  argue  as  in  Theorem  M,  if  0 < wi  < w2  < ■ ■ ■ < wn  and  sk  = wi  H h wk , 

because  wk  > 2 4 implies  that  sk- 1 + 2_t  < sk  < s/t+i  — 2-t  when  the  weights  are 
ordered;  hence  we  have  |<r* | < 1 + lg(l/wk).  [This  result,  together  with  the  matching 
lower  bound  H(w  1, . . . , wn),  was  Theorem  9 in  Shannon’s  original  paper  of  1948.] 

40.  If  k = s + 3,  the  stated  rearrangement  changes  the  cost  from  qk-il  + qkl  + qk-2h-2 
to  qk-2l  + qk-il  + qkh- 2,  so  the  net  change  is  (qk-2  — qk)(l  — h- 2);  this  is  negative  if 
l < Ik- 2,  because  qk-2  > qk- 

Similarly,  if  k > s -(-  4 the  rearrangement  changes  the  cost  by 

<5  = <?s+l(l  — h+l)  + qa+2(l  — h+ 2)  + qs+3(ls+l  — h+ 3)  -I + qk-2(lk-4  — Ik- 2) 

+ qk-l{lk-3  — l)  + qk{lk- 2 — 1). 

We  have  qs+ 1 > qs+3,  q„+2  > qs+ 4,  • • ■ , qk-2  > qk-  Therefore  we  find 


& ^ ilk-2  — qk){l  — Ik- 2)  + {qk- 3 — <7/c-l)(!  — lk- 3)  < 0; 
for  example,  when  k — s is  even  we  have 


< qk-3{l  — h+l)  + qk-2(l  — h+ 2)  + qk-3ils+l  — h+3)  + • • • + qk-2{lk-4  — h- 2) 

+ qk-l{lk-3  — l)  + qk{lk-2  — l) 

and  a similar  derivation  works  when  k - s is  odd.  It  follows  that  S is  negative  unless 
lk- 2 = l- 

41.  EFGHTUXYZVWBCDAPQRJKLMINOSu. 

42.  Let  qj  = WT(Pj).  Steps  C1-C4,  which  move  qk-i  + qk  into  position  between  q3-i 
and  qj,  can  spoil  (31)  only  at  the  point  i = j — 1. 
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43.  Invoke  the  recursive  procedure  mark  (Pi , 0),  where  mark  (P,  l)  means  the  following: 

LEVEL (P)  <-  l ; 

if  LLINK(P)  / A then  marfc  (LLINK(P) , l + 1); 
if  RLINK(P)  / A then  marfc  (RLINK(P) , l + 1). 

44.  Set  the  global  variables  t <—  0,  m 4—  2 n,  and  invoke  the  recursive  subroutine 
build  (1),  where  build  (l)  means  the  following: 

Set  j t—  m; 

if  LEVEL  (Xf)  = l then  set  LLINK(Xj)  «-  X,  and  t <r-  t + 1, 

otherwise  set  m t-  m — 1,  LLINK(Xj)  <-  Xm,  and  build(l  + 1); 
if  LEVEL (X()  = l then  set  RLINK(Xj)  <-  Xt  and  t •<-  t + 1, 

otherwise  set  m *-  m - 1,  RLINK(Xj  ) <—  Xm,  and  build  (l  + 1). 

The  variable  j is  local  to  the  build  routine.  [This  elegant  solution  is  due  to  R.  E.  Tarjan, 
SICOMP  6 (1977),  639.]  Caution:  If  the  numbers  lo,  . . . , ln  do  not  correspond  to  any 
binary  tree,  the  algorithm  will  loop  forever. 

45.  Maintain  the  working  array  P0,  . . . , Pt  as  a doubly  linked  list  that  also  has  the 
links  of  a balanced  tree  (see  Section  6.2.3).  If  the  2-descending  weights  are  go,  . . . , qt, 
with  qu  at  the  root  of  the  tree,  we  can  decide  whether  to  proceed  left  or  right  in  the 
tree  based  on  the  values  of  qu  and  qn+ 1;  the  double  linking  provides  instant  access  to 
qu+ 1-  (No  RANK  fields  are  needed;  rotation  preserves  symmetric  order,  so  it  does  not 
require  any  changes  to  the  double  links.) 

Several  families  of  weights  for  which  the  problem  can  be  solved  in  O(n)  time  have 
been  presented  by  Hu  and  Morgenthaler,  Lecture  Notes  in  Comp.  Sci.  1120  (1996), 
234-243;  it  is  unknown  whether  0(n ) steps  are  sufficient  in  general. 

46.  See  IEEE  Trans.  C-23  (1974),  268-271;  see  also  exercise  6.2.3-21. 

47.  See  Altenkamp  and  Mehlhorn,  JACM  27  (1980),  412-427. 

48.  Don’t  let  the  complicated  analyses  of  the  cases  N = 3 [Jonassen  and  Knuth, 
J.  Comp.  Syst.  Sci.  16  (1978),  301-322]  or  AT  = 4 [Baeza- Yates,  BIT  29  (1989),  378- 
394]  scare  you;  think  big!  Some  progress  has  been  reported  by  Louchard,  Randrianari- 
manana,  and  Schott,  Theor.  Comp.  Sci.  93  (1992),  201-225. 

49.  This  question  was  first  investigated  by  J.  M.  Robson  [Australian  Comp.  J.  11 
(1979),  151-153],  B.  Pittel  [J.  Math.  Anal.  Applic.  103  (1984),  461-480],  and  Luc 
Devroye  [JACM  33  (1986),  489-498;  Acta  Inf.  24  (1987),  277-298],  who  obtained 
limit  formulas  that  hold  with  probability  — > 1 as  n ->■  oo;  see  the  exposition  by  H.  M. 
Mahmoud,  Evolution  of  Random  Search  Trees  (Wiley,  1992),  Chapter  2.  Sharper 
results  were  subsequently  found  by  Bruce  Reed  [JACM  50  (2003),  306-332]  and  Michael 
Drmota  [JACM  50  (2003),  333-374],  who  proved  that  the  average  height  is  aln n — 
(3alnlnn)/(2a  — 2)  + 0(1)  and  the  variance  is  0(1),  where 

a = 1 /T(i)  « 4.311070407001005  03504  7076096446  8902783916- 

and  T(z)  = Tin~1zn/n\  is  the  tree  function. 

SECTION  6.2.3 

1.  The  symmetric  order  of  nodes  must  be  preserved  by  the  transformation,  otherwise 
we  wouldn’t  have  a binary  search  tree. 
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2.  B(S)  = 0 can  happen  only  when  S points  to  the  root  of  the  tree  (it  has  never 
been  changed  in  steps  A3  or  A4),  and  all  nodes  from  S to  the  point  of  insertion  were 
balanced. 

3.  Let  ph  be  the  largest  possible  ratio  of  unbalanced  nodes  in  balanced  trees  of 
height  h.  Thus  pi  = 0,  p2  = p3  = 5.  We  will  prove  that  ph  = (Fh+ 1 - 1 )/(Fh+2  - 1). 
Let  Th  be  a tree  that  maximizes  ph ; then  we  may  assume  that  its  left  subtree  has  height 
h — 1 and  its  right  subtree  has  height  h — 2,  for  if  both  subtrees  had  height  h — 1 the 
ratio  would  be  less  than  ph- 1.  Thus  the  ratio  for  Th  is  at  most  (ph-iiVj + p/,_2fVr  + 1)/ 
( Ni  + iVr  + 1),  where  there  are  ( Ni,Nr ) nodes  in  the  (left,  right)  subtree.  This  formula 
takes  its  maximum  value  when  ( Ni,Nr ) take  their  minimum  values;  hence  we  may 
assume  that  Th  is  a Fibonacci  tree.  And  ph  < <j>  — 1 by  exercise  1.2.8-28. 

4.  When  h = 7, 


has  greater  path  length.  [Note:  C.  C.  Foster,  Proc.  ACM  Nat.  Conf.  20  (1965),  197- 
198,  gave  an  incorrect  procedure  for  constructing  IV-node  balanced  trees  of  maximum 
path  length;  Edward  Logg  has  observed  that  Foster’s  Fig.  3 gives  a nonoptimal  result 
after  24  steps  (node  number  22  can  be  removed  in  favor  of  number  25).] 

The  Fibonacci  tree  of  order  h does,  however,  minimize  the  value  of  (h  + a)N  — 
(external  path  length(T))  over  all  balanced  trees  T of  height  h — 1,  when  a is  any 
nonnegative  constant;  this  is  readily  proved  by  induction  on  h.  Its  external  path  length 
is  yJiFh-i  + § (A  — 1 )Fh  = (4>/y/5)hFh+i  +0(Fh+ 1)  = Q(hcf>h).  Consequently  the  path 
length  of  any  Af-node  balanced  tree  is  at  most 

min (hN  - Q(h(j)h)  + O(N))  < ATlog0  N - ATlog^log^  N + O(N). 

Moreover,  if  N is  large  and  k = [lg  N] , h = [k/  lg</>-  log^  fcj  = log0  N - log^  log^  N + 
0(1),  we  can  construct  a balanced  tree  of  path  length  hN  + O(N)  as  follows:  Write 
N + 1 = Fh  + Fh-\  + • • • + Fk+i  + N'  = Fh+ 2 — Fk+2  + N' , and  construct  a complete 
binary  tree  on  N'  nodes;  then  successively  join  it  with  Fibonacci  trees  of  orders  k,  k + 1, 
. . . , h — 1.  [See  R.  Klein  and  D.  Wood,  Theoretical  Comp.  Sci.  72  (1990),  251-264.] 

5.  This  can  be  proved  by  induction;  if  Tn  denotes  the  tree  constructed,  we  have 


Tn  = { 


, if  2n  < N < 2n  + 2"-1; 


* 2n  — 1 — 1 1N- 2n_1 


if  2n  +2n-l  < N < 2n+1. 


T^-i  T\r_ 


6.  The  coefficient  of  zn  in  zBj(z)Bk(z)  is  the  number  of  n-node  binary  trees  whose 
left  subtree  is  a balanced  binary  tree  of  height  j and  whose  right  subtree  is  a balanced 
binary  tree  of  height  k. 
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7.  Cn+1  = C2  + 2 Bn-iBn-2;  hence  if  we  let  ao  = In 2,  a.\  = 0,  and  an+2  = 
ln(l  + ZBn+xBn  / Cn+2)  — 0(1/ BnCn+2),  and  9 — exp(ao/2  + ai/4  + <22/8  + •••), 

we  find  that  0 < 02"-  C„  = Cn(exp(a„/2  + an+1/4  4 ) - l)  < 1;  thus  C„  = [fl2"]. 

For  general  results  on  doubly  exponential  sequences,  see  Fibonacci  Quarterly  11  (1973), 
429-437.  The  expression  for  6 converges  rapidly  to  the  value 

9 = 1.43687  28483  94461 87580  04279  84335  54862  92481+. 

8.  Let  bh  = B'h(l)/Bh(l)  + 1,  and  let  eh  = 2BhBh-.1(bh  - bh-i)/Bh+i-  Then  61  = 2, 
bh+i  = 2bh  — eh,  and  Eh  = 0(bh/ Bh-i);  hence  bh  = 2h/3  + 77,,  where 

P = 1 — lei  - ge2  = 0.701179815102026  33972  4486892779  46053  74616+ 

and  Th  — th/ 2 + Eh+ 1/4  H is  extremely  small  for  large  h.  [Zhurnal  Vychisl.  Matem. 

i Matem.  Fiziki  6,2  (1966),  389-394.  Analogous  results  for  2-3  trees  were  obtained  by 
E.  M.  Reingold,  Fib.  Quart.  17  (1979),  151-157.] 

9.  Andrew  Odlyzko  has  shown  that  the  number  of  balanced  trees  is  asymptotically 

c /(l°S(x/l0+2)/3  n)/ni 

where  c « 1.916067  and  fix)  = f(x  + 1).  His  techniques  will  also  yield  the  average 
height.  [See  Congress  us  Numerantium  42  (1984),  27-52,  a paper  in  which  he  also 
discusses  the  enumeration  of  2-3  trees.] 

10.  [Inf.  Proc.  Letters  17  (1983),  17-20.]  Let  Xi,  ...,  Xjv  be  nodes  whose  balance 
factors  B(Xfc)  are  given.  To  construct  the  tree,  set  k «-  0 and  compute  TREE (00) , where 
IKEE(hmax)  is  the  following  recursive  procedure  with  local  variables  h,  h1 , and  Q:  Set 
h E—  0,  Q «-  A;  then  while  h < hmax  and  k < N set  k <—  k + 1,  h!  y-  h + B(Xfc), 
LEFT(Xfc)  <—  Q,  RIGHT(Xjt)  <—  TREE(h'),  h <—  ma x(h,h')  + 1,  Q Xfc ; return  Q.  (Tree  Q 
has  height  h and  corresponds  to  the  balance  factors  that  have  been  read  since  entry  to 
the  procedure.)  The  algorithm  works  even  if  |B (Xfc ) | > 1. 

11.  Clearly  there  are  as  many  +A’s  as  — B’s  and  +-B’s,  when  n > 2,  and  there  is 
symmetry  between  + and  -.  If  there  are  M nodes  of  types  +A  or  -A,  consideration  of 
all  possible  cases  when  n > 1 shows  that  the  next  random  insertion  results  in  M — 1 
such  nodes  with  probability  3M/(n  + l),  otherwise  it  results  inM  + 1 such  nodes.  The 
result  follows.  [SICOMP  8 (1979),  33-41;  Kurt  Mehlhorn  extended  the  analysis  to 
deletions  in  SICOMP  11  (1982),  748-780.  See  R.  A.  Baeza- Yates,  Computing  Surveys 
27  (1995),  109-119,  for  a summary  of  later  developments  in  such  “fringe  analyses,” 
which  typically  use  the  methods  illustrated  in  exercise  6. 2. 4-8.] 

12.  The  maximum  occurs  when  inserting  into  the  second  external  node  of  (12);  C = 4, 
Cl  = 3,  D = 3,  A = C 2 = F = G1  = 771  = U\  = 1,  for  a total  time  of  132u. 
The  minimum  occurs  when  inserting  into  the  third-last  external  node  of  (13);  C = 2, 
Cl  = C 2 = 1,  D — 2,  for  a total  time  of  61u.  [The  corresponding  figures  for  Program 
6.2.2T  are  74u  and  26u.] 

13.  When  the  tree  changes,  only  0(log  N)  RANK  values  need  to  be  updated;  the 
“simple”  system  might  require  very  extensive  changes. 

14.  Yes.  (But  typical  operations  on  lists  are  sufficiently  nonrandom  that  degenerate 
trees  would  probably  occur.) 

15.  Use  Algorithm  6.2.2T  with  m set  to  zero  in  step  Tl,  and  m <—  m + RANK(P) 
whenever  K > KEY(P)  in  step  T2. 
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16.  Delete  E;  do  Case  3 rebalancing  at  D.  Delete  G;  replace  F by  G;  do  Case  2 rebalancing 
at  H;  adjust  balance  factor  at  K. 


19.  (Solution  by  Clark  Crane.)  There  is  one  case  that  can’t  be  handled  by  a single  or 
double  rotation  at  the  root,  namely 


and  then  resolve  the  imbalance  by  applying  a single  or  double  rotation  at  C. 
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20.  It  is  difficult  to  insert  a new  node  at  the  extreme  left  of  the  tree 


but  K.-J.  Raiha  and  S.  H.  Zweben  have  devised  a general  insertion  algorithm  that  takes 
O(loglV)  steps.  [CACM  22  (1979),  508-512.] 

21.  Algorithm  A does  the  job  in  order  N log  N steps  (see  exercise  5);  the  following 
algorithm  creates  the  same  trees  in  O(N)  steps,  using  an  interesting  iterative  rendition 
of  a recursive  method.  We  use  three  auxiliary  lists: 

Di) . . . ,D|  (a  binary  counter  that  essentially  controls  the  recursion); 

Ji, . . . , Ji  (a  list  of  pointers  to  juncture  nodes); 

Ti, . . . ,Tj  (a  list  of  pointers  to  trees). 

Here  l = |"lg(iV  + 1)] . For  convenience  the  algorithm  also  sets  D0  1,  J0  <—  J;+i  <—  A. 
Gl.  [Initialize.]  Set  l <-  0,  J0  «-  Ji  F-  A,  Do  F-  1. 

G2.  [Get  next  item.]  Let  P point  to  the  next  input  node.  (We  may  invoke  another 
coroutine  in  order  to  obtain  P.)  If  there  is  no  more  input,  go  to  G5.  Otherwise, 
set  k 4—  1,  Q t—  A,  and  interchange  P Ji. 

G3.  [Carry.]  If  k > l (or,  equivalently,  if  P — A),  set  l <—  l + 1,  Dj,  t-  1,  Tfc  •$-  q, 
Jfc+i  <—  A,  and  return  to  G2.  Otherwise  set  Dfc  <-  1 -D*,,  interchange  Q T*., 
P Ft  Jfc+i,  and  increase  k by  1.  If  now  Dfc_i  = 0,  repeat  this  step. 

G4.  [Concatenate.]  Set  LLINK(P)  <-  T*,,  RLINK(P)  <-  Q,  B(P)  e-  0,  T*  <-  P,  and 
return  to  G2. 

G5.  [Finish  up.]  Set  LLINKCJ*,)  <-  lk,  RLINK(Jfc)  •(-  Jfc_!,  B(Jfc)  -f-  1 - Dfc_!,  for 
1 <k<l.  Then  terminate  the  algorithm;  J;  points  to  the  root  of  the  desired 
tree.  | 

Step  G3  is  executed  2 N — v(N)  times,  where  v(N)  is  the  number  of  Is  in  the  binary 
representation  of  N. 

22.  The  height  of  a weight-balanced  tree  with  N internal  nodes  always  lies  between 
lg(N  + 1)  and  2 lg(iV  + 1).  To  get  this  upper  bound,  note  that  the  heavier  subtree  of 
the  root  has  at  most  (N  + 1)/V2  external  nodes. 

23.  (a)  Form  a tree  whose  right  subtree  is  a complete  binary  tree  with  2"  - 1 nodes, 
and  whose  left  subtree  is  a Fibonacci  tree  with  Fn+i  — 1 nodes,  (b)  Form  a weight- 
balanced  tree  whose  right  subtree  is  about  2 lg  N levels  high  and  whose  left  subtree  is 
about  lgiV  levels  high  (see  exercise  22). 

24.  Consider  a smallest  tree  that  satisfies  the  condition  but  is  not  perfectly  balanced. 
Then  its  left  and  right  subtrees  are  perfectly  balanced,  so  they  have  2l  and  2r  external 
nodes,  respectively,  where  l / r.  But  this  contradicts  the  stated  condition. 

25.  After  inserting  a node  at  the  bottom  of  the  tree,  we  work  up  from  the  bottom  to 
check  the  weight  balance  at  each  node  on  the  search  path.  Suppose  imbalance  occurs 
at  node  A in  (l),  after  we  have  inserted  a new  node  in  the  right  subtree,  where  B and 
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its  subtrees  are  weight-balanced.  Then  a single  rotation  will  restore  the  balance  unless 
(|a|  + |/3|)/|tI  > + 1,  where  \x\  denotes  the  number  of  external  nodes  in  a tree  x. 

But  in  this  case  it  can  be  shown  that  a double  rotation  will  suffice.  [See  SICOMP  2 
(1973),  33-43.] 

27.  It  is  sometimes  necessary  to  make  two  comparisons  in  nodes  that  contain  two  keys. 
The  worst  case  occurs  in  a tree  like  the  following,  which  sometimes  needs  2 lg(7V  + 2)  — 2 
comparisons: 


29.  Partial  solution  by  A.  Yao:  With  TV  > 6 keys  the  lowest  level  will  contain  an 
average  of  | (TV  + 1)  one-key  nodes  and  I (TV  + 1)  two-key  nodes.  The  average  total 
number  of  nodes  lies  between  0.707V  and  0.797V,  for  large  TV.  [Act a Informatica  9 
(1978),  159-170.] 

30.  For  best-fit,  arrange  the  records  in  order  of  size,  with  an  arbitrary  rule  to  break 
ties  in  case  of  equality.  (See  exercise  2.5-9.)  For  first  fit,  arrange  the  records  in  order 
of  location,  with  an  extra  field  in  each  node  telling  the  size  of  the  largest  area  in 
the  subtree  rooted  at  that  node.  This  extra  field  can  be  maintained  under  insertions 
and  deletions.  (Although  the  running  time  is  O(logn),  it  probably  still  doesn’t  beat 
the  “ROVER”  method  of  exercise  2.5-6  in  practice;  but  the  memory  distribution  may 
be  better  without  ROVER,  since  there  will  usually  be  a nice  large  empty  region  for 
emergencies.) 

An  improved  method  has  been  developed  by  R.  P.  Brent,  ACM  Trans.  Prog. 
Languages  and  Systems  11  (1989),  388-403. 

31.  Use  a nearly  balanced  tree,  with  additional  upward  links  for  the  leftmost  part, 
plus  a stack  of  postponed  balance  factor  adjustments  along  this  path.  (Each  insertion 
does  a bounded  number  of  these  adjustments.) 

This  problem  can  be  generalized  to  require  O(logm)  steps  to  find,  insert,  and/or 
delete  items  that  are  m steps  away  from  any  given  “finger,”  where  any  key  once  located 
can  serve  as  a finger  in  later  operations.  [See  S.  Huddleston  and  K.  Mehlhorn,  Acta 
Inf.  17  (1982),  157-184.] 

32.  Each  right  rotation  increases  one  of  the  r’s  and  leaves  the  others  unchanged;  hence 
rk  < r'k  is  necessary.  To  show  that  it  is  sufficient,  suppose  r3  = r'  for  1 < j < k but 
rk  < r'k.  Then  there  is  a right  rotation  that  increases  rk  to  a value  < r'k,  because  the 
numbers  rir2  . . . r„  satisfy  the  condition  of  exercise  2.3.3-19(a). 

Notes:  This  partial  ordering,  first  introduced  by  D.  Tamari  in  1951,  has  many 
interesting  properties.  Any  two  trees  have  a greatest  lower  bound  T AT',  determined 
by  the  right-subtree  sizes  min^,  ri)  min(r’2,  r'2) . . . min(r„,  r/),  as  well  as  a least  upper 
bound  T V T'  determined  by  the  left-subtree  sizes  min^! , l[)  min(Z2, 1'2)  ■ ■ ■ min (ln,l'n). 
The  left-subtree  sizes  are,  of  course,  one  less  than  the  RANK  fields  of  Algorithms  B and  C. 
For  further  information,  see  H.  Friedman  and  D.  Tamari,  J.  Combinatorial  Theory  2 
(1967),  215-242,  4 (1968),  201;  C.  Greene,  Europ.  J.  Combinatorics  9 (1988),  225- 
240;  D.  D.  Sleator,  R.  E.  Tarjan,  and  W.  P.  Thurston,  J.  Amer.  Math.  Soc.  1 (1988), 
647-681;  J.  M.  Pallo,  Theoretical  Informatics  and  Applic.  27  (1993),  341-348;  M.  K. 
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Bennett  and  G.  Birkhoff,  Algebra  Universalis  32  (1994),  115-144;  P.  H.  Edelman  and 
V.  Reiner,  Mathematika  43  (1996),  127-154. 

33.  First,  we  can  reduce  the  storage  to  one  bit  A(P)  in  each  node  P,  so  that  B(P)  = 
A(RLINK(P))  — A(LLINK(P))  whenever  LLINK(P)  and  RLINK(P)  are  both  nonnull;  oth- 
erwise B(P)  is  known  already.  Moreover,  we  can  assume  that  A(P)  =0  whenever 
LLINK(P)  and  RLINK(P)  are  both  null.  Then  A(P)  can  be  eliminated  in  all  other  nodes 
by  swapping  LLINK(P)  withRLINK(P)  whenever  A(P)  = 1;  a comparison  of  KEY (P)  with 
KEY(LLINK(P))  or  KEY (RLINK(P))  will  determine  A (P). 

Of  course,  on  machines  for  which  pointers  are  always  even,  two  unused  bits  are 
present  already  in  every  node.  Further  economies  are  possible  as  in  exercise  2.3.1-37. 


SECTION  6.2.4 


1.  Two  nodes  split: 


2.  Altered  nodes: 


(Of  course  a B*- tree  would  have  no  nonroot  3-key  nodes,  although  Fig.  30  does.) 

3.  (a)  1 + 2 ■ 50  + 2 • 51  • 50  + 2 • 51  • 51  • 50  = 2 • 513  - 1 = 265301. 

(b)  1 + 2 -50 + (2-51 -100 -100) + ((2-51 -101 -100) -100 -100)  = 1013  = 1030301. 

(c)  1 + 2 • 66  + (2  • 67  • 66  + 2)  + (2  • 67  ■ 67  • 66  + 2 • 67)  = 601661.  (Less  than  (b)!) 

4.  Before  splitting  a nonroot  node,  make  sure  that  it  has  two  full  siblings,  then 
split  these  three  nodes  into  four.  The  root  should  split  only  when  it  has  more  than 
3 [(3m  — 3)/4j  keys. 

5.  Interpretation  1,  trying  to  maximize  the  stated  minimum:  450.  (The  worst  case 
occurs  if  we  have  1005  characters  and  the  key  to  be  passed  to  the  parent  must  be  50 
characters  long:  445  chars  + ptr  + 50-char  key  + ptr  + 50-char  key  + ptr  + 445  chars.) 

Interpretation  2,  trying  to  equalize  the  number  of  keys  after  splitting,  in  order  to 
keep  branching  factors  high:  155  (15  short  keys  followed  by  16  long  ones). 

See  E.  M.  McCreight,  CACM  20  (1977),  670-674,  for  further  comments. 

6.  If  the  key  to  be  deleted  is  not  on  level  1 — 1,  replace  it  by  its  successor  and  delete 
the  successor.  To  delete  a key  on  level  l — 1,  we  simply  erase  it;  if  this  makes  the  node 
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too  empty,  we  look  at  its  right  (or  left)  sibling,  and  “underflow,”  that  is,  move  keys  in 
from  the  sibling  so  that  both  nodes  have  approximately  the  same  amount  of  data.  This 
underflow  operation  will  fail  only  if  the  sibling  was  minimally  full,  but  in  that  case  the 
two  nodes  can  be  collapsed  into  one  (together  with  one  key  from  their  parent);  such 
collapsing  may  cause  the  parent  in  turn  to  underflow,  etc.  With  variable-length  keys 
as  in  exercise  5,  a parent  node  may  need  to  split  when  one  of  its  keys  becomes  longer. 

8.  Given  a tree  T with  N internal  nodes,  let  there  be  a$  external  nodes  that  require 
k accesses  and  whose  parent  node  belongs  to  a page  containing  j keys;  and  let  A(j>  ( z ) be 
the  corresponding  generating  function.  Thus  A^(l)  + • • • + (1)  = N + 1.  (Note 

that  a$  is  a multiple  of  j + 1,  for  1 < j < M .)  The  next  random  insertion  leads  to 
N + 1 equally  probable  trees,  whose  generating  functions  are  obtained  by  decreasing 
some  coefficient  by  j + 1 and  adding  j + 2 to  a^+1* ; or  (if  j = M ) by  decreasing 
some  ak  I]  by  1 and  adding  2 to  aj^r  Now  13$ (z)  is  (7V  + 1)-1  times  the  sum,  over  all 
trees  T,  of  the  generating  function  A^\z)  for  T times  the  probability  that  T occurs; 
the  stated  recurrence  relations  follow. 

The  recurrence  has  the  form 


(z))T  = (I+(N+  I)"1  WWXBjlLAz), . 
= ...=gN(W(z))(  0,...,0,1)T, 


where 


9n(x) 


t)  •■•(>+§) 


1 / s + n + IN 

z + lV  n + 1 )' 


It  follows  that  C'N  = (1, . . . , 1)(B$'(1), . . . , B$)'(1))T  = 2B$J1(1)/(N+1)+C'n_1  = 
2 }n{W)mm,  where  fn(x)  = gn- i(x)/(n  + 1)  + • • ■ + go(x)/2  = (g„(x)  - l)/x,  and 
W = W(l).  (The  subscript  MM  denotes  the  lower  right  corner  element  of  the  matrix.) 
Now  W — S-1  diag  (Ai, . . . , A m)S,  for  some  matrix  S,  where  diag  (Ai, . . . , Am)  denotes 
the  diagonal  matrix  whose  entries  are  the  roots  of  x(A)  = (A+2) . . . (A+M+l)— (M+l)!. 
(The  roots  are  distinct,  since  x(A)  = x((A)  = 0 implies  l/(A  + 2)  + --  - + l/(A  + M + l)  = 
0;  the  latter  can  hold  only  when  A is  real,  and  — M — 1 < A < —2,  which  implies 
that  |A  -f  2| . . . |A  + M + 1|  < (M  + 1)!,  a contradiction.)  If  p(x)  is  any  polynomial, 
p(W)  = p(5_1  diag  (Ai, . . . , Am)S)  = S_1  diag  (p(Ai), . . . ,p(Xm))S;  hence  the  lower 

right  corner  element  of  p(W)  has  the  form  Cip(Ai)  H 1-  cmp(Am)  for  some  constants 

ci,...,  cm  independent  of  p.  These  constants  may  be  evaluated  by  setting  p(A)  = 
x(A)/(A  — Aj);  since  ( Wk)MM  — (— 2)k  for  0 < k < M — 1,  we  have  p(W)mm  = p(— 2)  = 
(M  + l)!/(Aj  + 2)  = Cjp( Aj)  = Cjx'(^i)  = cj(M  + 1)!  (l/(Ay -T 2) - • • + l/(Aj-t-M  + l)); 
hence  Cj  = (\3  + 2) — 1 (1/ (Ay  + 2)  + • • • + 1 / ( A_,  + M + 1))  x.  This  yields  an  “explicit” 
formula  C'N  = YlfLi  2cj/Af(Aj);  and  it  remains  to  study  the  roots  Aj.  Note  that 
|Aj+M+l|  < M+l  for  all  j,  otherwise  we  would  have  |Aj+2| ...  |Aj+M+l | > (M+l)!. 
Taking  Ai  = 0,  this  implies  that  5R(Aj)  < 0 for  2 < j < M.  By  Eq.  1.2.5-(i5), 
gn(x)  ~ (n  + \)X/T(x  + 2)  as  n — > oo;  hence  gn( Aj)  -A  0 for  2 < j < M.  Consequently 
C'N  = 2ci/jv(0)  + 0(1)  = Hn/(Hm+ i - 1)  + 0(1). 

Notes:  The  analysis  above  is  relevant  also  to  the  “samplesort”  algorithm  dis- 
cussed briefly  in  Section  5.2.2.  The  calculations  may  readily  be  extended  to  show  that 
B$(  1)  ~ (Hm+i  ~ l)_1/0  +2)  for  1 < j < M,  B$\  1)  ~ {HM+ 1 - l)_1/2.  Hence  the 
total  number  of  interior  nodes  on  unfilled  pages  is  approximately 


f-i- 

V3  x 2 


+ 


2 

4x3 


+ •••  + 


M- 1 > 

N 

fl  M \ 

(M+  1)  x M) 

Hm+i  - 1 

\ (M  + 1)(Hm+i  — 1)/ 
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and  the  total  number  of  pages  used  is  approximately 

f-j_  + i + = a 

\3x2  4x3  (M+l)xAf  M + l)  HM+1-1  2(HM+i-l)’ 

yielding  an  asymptotic  storage  utilization  of  2(Hm+i  — 1)/M. 

This  analysis  has  been  extended  by  Mahmoud  and  Pittel  [J.  Algorithms  10  (1989), 
52-75],  who  discovered  that  the  variance  of  the  storage  utilization  undergoes  a sur- 
prising phase  transition:  When  M < 25,  the  variance  is  ©(IV);  but  when  M > 26  it  is 
asymptotically  f(N)N1+2a  where  f(e*^N)  = f(N),  if  — | + a + pi  and  — | + a — /3i 
are  the  nonzero  roots  Xj  with  largest  real  part. 

The  height  of  such  trees  has  been  analyzed  by  L.  Devroye  [Random  Structures  & 
Algorithms  1 (1990),  191-203];  see  also  B.  Pittel,  Random  Structures  & Algorithms  5 
(1994),  337-347. 

9.  Yes;  for  example  we  could  replace  each  Ki  in  (l)  by  i plus  the  number  of  keys  in 
subtrees  Po,...,Pi-i.  The  search,  insertion,  and  deletion  algorithms  can  be  modified 
appropriately. 

10.  Brief  sketch:  Extend  the  paging  scheme  so  that  exclusive  access  to  buffers  is  given 
to  one  user  at  a time;  the  search,  insertion,  and  deletion  algorithms  must  be  carefully 
modified  so  that  such  exclusive  access  is  granted  only  for  a limited  time  when  absolutely 
necessary,  and  in  such  a way  that  no  deadlocks  can  occur.  For  details,  see  B.  Samadi, 
Inf.  Proc.  Letters  5 (1976),  107-112;  R.  Bayer  and  M.  Schkolnick,  Acta  Inf.  9 (1977), 
1-21;  Y.  Sagiv,  J.  Comp.  Syst.  Sci.  33  (1986),  275-296. 

SECTION  6.3 

1.  Lieves  (the  plural  of  “lief”). 

2.  Perform  Algorithm  T using  the  new  key  as  argument;  it  will  terminate  unsuccess- 
fully in  either  step  T3  or  T4.  If  in  T3,  simply  set  table  entry  k of  NODE(P)  to  K and 
terminate  the  insertion  algorithm.  Otherwise  set  this  table  entry  to  the  address  of  a 
new  node  Q 4=  AVAIL,  containing  only  null  links,  then  set  P f-  Q.  Now  set  k and  k'  to 
the  respective  next  characters  of  K and  X;  if  k ^ k',  store  K in  position  k of  N0DE(P) 
and  store  X in  position  k1,  but  if  k — k'  again  make  the  k position  point  to  a new 
node  Q 4=  AVAIL,  set  P 4—  Q,  and  repeat  the  process  until  eventually  k ^ k'.  (We  must 
assume  that  no  key  is  a prefix  of  another.) 

3.  Replace  the  key  by  a null  link,  in  the  node  where  it  appears.  If  this  node  is  now 
useless  because  all  its  entries  are  null  except  one  that  is  a key  X , delete  the  node  and 
replace  the  corresponding  pointer  in  its  parent  by  X.  If  the  parent  node  is  now  useless, 
delete  it  in  the  same  way. 

4.  Successful  searches  take  place  exactly  as  with  the  full  table,  but  unsuccessful 
searches  in  the  compressed  table  may  go  through  several  additional  iterations.  For 
example,  an  input  argument  such  as  TRASH  will  make  Program  T take  six  iterations 
(more  than  five!);  this  is  the  worst  case.  It  is  necessary  to  verify  that  no  infinite 
looping  on  blank  sequences  is  possible.  (This  remarkable  49-place  packing  is  due  to 
J.  Scot  Fishburn,  who  also  showed  that  48  places  do  not  suffice.) 

A slower  but  more  versatile  way  to  economize  on  trie  storage  has  been  proposed 
by  Kurt  Maly,  CACM  19  (1976),  409-415. 

In  general,  if  we  want  to  compress  n sparse  tables  containing  respectively  xi, 
. . . , x„  nonzero  entries,  a first-fit  method  that  offsets  the  j th  table  by  the  minimum 


722  ANSWERS  TO  EXERCISES 


6.3 


amount  rj  that  will  not  conflict  with  the  previously  placed  tables  will  have  r3  < 
(xi  + ■ ■ ■ + Xj-i)xj,  since  each  previous  nonzero  entry  can  block  at  most  Xj  offsets. 
This  worst-case  estimate  gives  r}  < 93  for  the  data  in  Table  1,  guaranteeing  that  any 
twelve  tables  of  length  30  containing  respectively  10,  5,  4,  3,  3,  3,  3,  3,  2,  2,  2,  2 nonzero 
entries  can  be  packed  into  93  + 30  consecutive  locations  regardless  of  the  pattern  of 
the  nonzeros.  Further  refinements  of  this  method  have  been  developed  by  R.  E.  Tarjan 
and  A.  C.  Yao,  CACM  22  (1979),  606-611.  A dynamic  implementation  of  compressed 
tries,  due  to  F.  M.  Liang,  is  used  for  hyphenation  tables  in  the  T)gX  typesetting  system; 
see  D.  E.  Knuth,  CACM  29  (1986),  471-478;  Literate  Programming  (1992),  206-233. 

5.  In  each  family,  test  for  the  most  probable  outcome  first,  by  arranging  the  letters 
from  left  to  right  in  decreasing  order  of  probability.  The  optimality  of  this  arrangement 
can  be  proved  as  in  Theorem  6. IS.  [See  CACM  12  (1969),  72-76.] 


7.  For  example,  8,  4,  1,  2,  3,  5,  6,  7,  12,  9,  10,  11,  13,  14,  15.  (No  matter  what 
sequence  is  used,  the  left  subtree  cannot  contain  more  than  two  nodes  on  level  4,  nor 
can  the  right  subtree.)  Even  this  “worst”  tree  is  within  4 of  the  best  possible  tree,  so 
we  see  that  digital  search  trees  aren’t  very  sensitive  to  the  order  of  insertion. 

8.  Yes.  The  KEY  fields  now  contain  only  a truncated  key;  leading  bits  implied  by  the 
node  position  are  chopped  off.  (A  similar  modification  of  Algorithm  T is  possible.) 


9.  START 

LDX 

K 

1 

Dl.  Initialize.  frX  = K) 

LD1 

ROOT 

1 

P <-  ROOT.  (rll  = P) 

JMP 

2F 

1 

4H 

LD2 

0, l(RLINK) 

C2 

D4.  Move  right.  Q t—  RLINK(P) . 

J2Z 

5F 

C 2 

To  D5  if  Q = A. 

1H 

ENT1 

0,2 

C-  1 

P t—  Q. 

2H 

CMPX 

1,1 

C 

D2.  Como  are. 

JE 

SUCCESS 

C 

Exit  if  K = KEY (P) . 

SLB 

1 

C -S 

Shift  K left  one  bit. 

JA0 

4B 

C-S 

To  D4  if  the  detached  bit  was  1 

LD2 

0, 1 (LLINK) 

Cl 

D3.  Move  left.  Q «-  LLINK (P) . 

J2NZ 

IB 

Cl 

To  D2  with  P «—  Q if  Q / A. 

5H  Continue  as  in  Program  6.2.2T,  interchanging  the  roles  of  rA  and  rX.  | 
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The  running  time  for  the  searching  phase  of  this  program  is  (IOC  — 3S  + 4)u,  where 
C — S is  the  number  of  bit  inspections.  For  random  data,  the  approximate  average 
running  times  are  therefore: 

Successful  Unsuccessful 

Program  6.2. 2T  15  In  N — 12.34  15  In  AT -2.34 

This  program  14.41n/V  — 6.17  14.4 In N + 1.26 

(Consequently  Program  6.2.2T  is  a shade  faster  unless  N is  very  large.) 

10.  Let  ® denote  the  exclusive  or  operation  on  n-bit  numbers,  and  let  f(x)  = n — 
[lg(a:  + 1)]  be  the  number  of  leading  zero  bits  of  x.  One  solution:  (b)  If  a search 
via  Algorithm  T ends  unsuccessfully  in  step  T3,  K is  one  less  than  the  number  of 
bit  inspections  made  so  far;  otherwise  if  the  search  ends  in  step  T4,  k = f(K  © X). 
(a,  c)  Do  a regular  search,  but  also  keep  track  of  the  minimum  value,  x,  of  K ©KEY(P) 
over  all  KEY  (P)  compared  with  K during  the  search.  Then  k = f(x).  (Prove  that  no 
other  key  can  have  more  bits  in  common  with  K than  those  compared  to  K.  In  case  (a), 
the  maximum  k occurs  for  either  the  largest  key  < K or  the  smallest  key  > K.) 

11.  No;  eliminating  a node  with  only  one  empty  subtree  will  “forget”  one  bit  in  the 
keys  of  the  nonempty  subtree.  To  delete  a node,  we  should  replace  it  by  one  of  its 
terminal  descendants,  for  example  by  searching  to  the  right  whenever  possible. 

12.  Insert  three  random  numbers  a,  0,  7 between  0 and  1 into  an  initially  empty  tree; 
then  delete  a with  probability  p , 0 with  probability  q,  7 with  probability  r,  using  the 
algorithm  suggested  in  the  previous  exercise.  The  tree 


is  obtained  with  probability  \p  + \q  + |r,  and  this  is  | only  if  p = 0. 

13.  Add  a KEY  field  to  each  node,  and  compare  K with  this  key  before  looking  at 
the  vector  element  in  step  T2.  Table  1 would  change  as  follows:  Nodes  (1),  . . . , (12) 
would  contain  the  respective  keys  THE,  AND,  BE,  FOR,  HIS,  IN,  OF,  TO,  WITH,  HAVE,  HE, 
THAT  (if  we  inserted  them  in  order  of  decreasing  frequency),  and  these  keys  would  be 
deleted  from  their  previous  positions.  [The  corresponding  program  would  be  slower  and 
more  complicated  than  Program  T,  in  this  case.  A more  direct  M- ary  generalization  of 
Algorithm  D would  create  a tree  with  N nodes,  having  one  key  and  M links  per  node.] 

14.  If  j < n,  there  is  only  one  place,  namely  KEY(P).  But  if  j > n,  the  set  of  all 
occurrences  is  found  by  traversing  the  subtree  of  node  P:  If  there  are  r occurrences,  this 
subtree  contains  r — 1 nodes  (including  node  P),  and  so  it  has  r link  fields  with  TAG  = 1; 
these  link  fields  point  to  all  the  nodes  that  reference  TEXT  positions  matching  K.  (It  isn’t 
necessary  to  check  the  TEXT  again  at  all.) 

15.  To  begin  forming  the  tree,  set  KEY  (HEAD)  to  the  first  TEXT  reference,  and  set 
LLINK(HEAD)  4—  HEAD,  LTAG(HEAD)  4—  1.  Further  TEXT  references  can  be  entered  into 
the  tree  using  the  following  insertion  algorithm: 

Set  K to  the  new  key  that  we  wish  to  enter.  (This  is  the  first  reference  the 
insertion  algorithm  makes  to  the  TEXT  array.)  Perform  Algorithm  P;  it  must  terminate 
unsuccessfully,  since  no  key  is  allowed  to  be  a prefix  of  another.  (Step  P6  makes  the 
second  reference  to  the  TEXT;  no  more  references  will  be  needed!)  Now  suppose  that 
the  key  located  in  step  P6  agrees  with  the  argument  K in  the  first  l bits,  but  differs 
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from  it  in  position  l + 1,  where  K has  the  digit  6 and  the  key  has  1 — 6.  (Even  though 
the  search  in  Algorithm  P might  have  let  j get  much  greater  than  l,  it  is  possible  to 
prove  that  the  procedure  specified  here  will  find  the  longest  match  between  K and  any 
existing  key.  Thus,  all  keys  of  the  text  that  start  with  the  first  l bits  of  K have  1 — 6 
as  their  (l  + l)st  bit.)  Now  repeat  Algorithm  P with  K replaced  by  these  leading  l bits 
(thus,  n 4—  /).  This  time  the  search  will  be  successful,  so  we  needn’t  perform  step  P6. 
Now  set  R 4=  AVAIL,  KEY(R)  4—  position  of  the  new  key  in  TEXT.  If  LLINK(Q)  = P,  set 
LLINK(Q)  -< — R,  £ -< — LTAG(Q),  LTAG(Q)  4-  0;  otherwise  set  RLINK(Q)  4-  R,  t 4-  RTAG(Q) , 
RTAG(Q)  4-0.  If  6 = 0,  set  LTAG(R)  4-  1,  LLINK(R)  4-  R,  RTAG(R)  4-  t,  RLINK(R)  4-  P; 
otherwise  set  RTAG(R)  4-  1,  RLINK(R)  4-  R,  LTAG(R)  4-  t , LLINK(R)  4-  P.  If  t = 1,  set 
SKIP(R)  4-  1+l-j;  otherwise  set  SKIP(R)  4-  l + l-j+SKIPCP)  andSKIP(P)  4-  j-l-1. 

16.  The  tree  setup  requires  precisely  one  dotted  link  coming  from  below  a node  to  that 
node;  it  comes  from  that  part  of  the  tree  where  this  key  first  differs  from  all  others.  If 
there  is  no  such  part  of  the  tree,  the  algorithms  break  down.  We  could  simply  drop 
keys  that  are  prefixes  of  others,  but  then  the  algorithm  of  exercise  14  wouldn’t  have 
enough  data  to  find  all  occurrences  of  the  argument. 

17.  If  we  define  ao  — oi  = 0,  then 


= an  + l)kak/(mk  1 

fc>2  ' 


1)  = E(fc)(-1)^mfc_1/(^-1-i). 


18.  To  solve  (4)  we  need  the  transform  of  an  = [n  > 1],  namely  an  — [n  = 0]  — 1 + n; 
hence  for  N > 2 we  obtain  An  = 1 — Un  + Vj v,  where  Un  = K(N,0,M)  and  Vjv  = 
K(N,  1,  M)  in  the  notation  of  exercise  19.  Similarly,  to  solve  (5),  take  an  — n—[n  = 1]  = 
an  and  obtain  Cn  = N + Vjv  for  N > 2. 

19.  For  s = 1,  we  have  Vn  = K(n,l,m)  = n((lnn  + 7)/ln  m - \ - 80{n  - 1))  + 0(1), 
and  for  s > 2 we  have  K(n,s,m)  = (— l)3n(l/lnm  + 8s-i(n  — s))/s(s  — 1)  + 0(1), 
where 

2 

8s(n)  = y~'j  jft(r(s  — 2nik/\nm)  exp(2nik  logm  n)) 

nm  fc>i 

is  a periodic  function  of  log  n.  [In  this  derivation  we  have 


AT(n+s,s,m)/(-l)s^n^S^ 


a+1  /*l/2+ioo 

2«  Jl/2-i, 


1/2  — too 


m3~l~z  — 1 


+ 0(n_s). 


For  small  m and  s,  the  8’s  will  be  negligibly  small;  see  exercise  5.2.2-46.  Note  that 
8s(n  — a)  = 8s(n ) + 0(n_l)  for  fixed  a.] 

20.  For  (a),  let  an  = [n  > s]  = 1 — ]C£=0[n  = fc];  for  (b),  let  an  — n — 5Zfc=o  k[n  = k]i 
and  for  (c),  we  want  to  solve  the  recurrence 


f m1  nEfc(r)(m~1)n  kVk  for  n>s, 

Vn  ~ l (T)  for  n < s. 

Setting  xn  = y„  — n yields  a recurrence  of  the  form  of  exercise  17,  with 


an  = (1-M  1)'f2(k2)[n  = k]- 
k= 0 


Therefore,  in  the  notation  of  previous  exercises,  the  answers  are  (a)  1 — K(N,  0,M)  + 
K{N,1,M) + (—l)a~1K(N,s,M)  = N/(s\nM)  - N^^N)  + 80{N  - 1)  + 
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Si(N  - 2)/2-l  + •••  + 8a-i(N  - s)/a(s  - 1))  + 0(1);  (b)  AT1  (AT  + K(N,1,M ) - 
2K(N,2,M)  + + (-l)-1aA’(iVJa,Af))  = (IniV  + 7 - H.-J/lnM  + 1/2  - 

(5o(JV— l)+6i(JV— 2)/lq \-8s-i(N—s)/(s  — l))+0(N~1)]  (c)  JV-1(Ar+(l-M“1)  x 

T,‘k=2(-1)k(2)K(N’k>M))  = 1 + |(1  - Af-1)((s  - l)/lnM  + <5X(JV  - 2)  + •••  + 
Sa-1(N  - s))  + 0{N~1). 

21.  Let  there  be  An  nodes  in  all.  The  number  of  nonnull  pointers  is  An  — 1,  and  the 
number  of  nonpointers  is  JV,  so  the  total  number  of  null  pointers  is  M An  — An  + 1 — JV. 
To  get  the  average  number  of  null  pointers  in  any  fixed  position,  divide  by  M.  [The 
average  value  oi  An  appears  in  exercise  20(a).] 

22.  There  is  a node  for  each  of  the  M1  sequences  of  leading  bits  such  that  at  least  two 
keys  have  this  bit  pattern.  The  probability  that  exactly  k keys  have  a particular  bit 
pattern  is 

so  the  average  number  of  trie  nodes  on  level  l is  Ml(  1 — (1  — M~l)N)  — JV(1  — M_,)iv_1. 

23.  More  generally,  consider  the  case  of  arbitrary  s as  in  exercise  20.  If  there  are  a; 
nodes  on  level  Z,  they  contain  a;+i  links  and  Mai  — <if+i  places  where  the  search 
might  be  unsuccessful.  The  average  number  of  digit  inspections  will  therefore  be 
£,>„(*  + 1)-^  1 1(Mai  — a(+i)  = Xw>o  M~lai.  Using  the  formula  for  a;  in  a random 
trie,  this  equals 


1 t K(N+1,1,M)-2K{N+1,2,M)  + ---  + (-1)3(s+1)K(N+1,s+1,M) 

24.  We  must  solve  the  recurrences  xo  = xi  = yo  = yi  = 0, 


+ ’ • • + Xnm  + ^ ^ [nj7^0]^ 

1<j  <m  ^ 


= an  + rn  n 

k 


yn~m E _ (m,..n,nm)  (2/ni+'“  + 2/n’"+  E Wi^]n3) 

nl  H Ynm—n  1 < t < j < m 

= bn  + m1~nJ2{nk)yk, 


for  n > 2,  where  an  = m(  1 - (1  - l/m)n)  and  bn  - \(m  - l)n(l  - (1  - l/m)n_1).  By 
exercises  17  and  18  the  answers  are  (a)  xn  = N + Vn  — Un  — [N  = 1]  = An  + N — 1 
(a  result  that  could  have  been  obtained  directly,  since  the  number  of  nodes  in  the  forest 
is  always  JV  — 1 more  than  the  number  in  the  corresponding  trie!);  and  (b)  dn/N  = 
\{M  - 1)Vn/N  = §(M  - l)((lnJV  + 7)/ln M - I - 80(N  - 1))  + 0(N~1). 

25.  (a)  Let  An  = M(N  — 1)/(M  — 1)  —En\  then  for  JV  > 2,  we  have  (1  — M1~n)En  = 
M-  1 - M(  1 - 1 /M)N~'  + M1_n  E0<fc<jv  (^)(Af  - 1)^-*^.  Since  M-  1 > 
M(  1 — 1/M)N  x,  we  have  En  > 0 by  induction,  (b)  By  Theorem  1.2.7A  with 
x = 1/(M  - 1)  and  n = JV  - 1,  we  find  DN  = aN  + M'~n  J2k  (”)(M  - 1 )N~kDk, 
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where  ai  = 0 and  0 < ajv  < M(  1 — l/M)N/\nM  < (M  — l)2/MlnM  for  N > 2.  Hence 
0 <DN  < (M  — l)2AN/MlnM  < (M  — 1)(N  — l)/lnM. 

26.  Taking  q = |,  z = — | in  the  second  identity  of  exercise  5.1.1-16,  we  get  1/3  — 
1/(3  • 7)  + 1/(3  • 7 • 15)  — • • • = 0.28879;  it’s  slightly  faster  to  use  z = —\  and  take  half  of 
the  result.  Alternatively,  Euler’s  formula  from  exercise  5.1.1-14  can  be  used,  involving 
only  negative  powers  of  2.  (John  Wrench  has  computed  the  value  to  40  decimal  digits, 
namely  0.28878  80950  86602  42127  88997  21929  23078  00889+.) 

27.  (For  fun,  the  following  derivation  goes  to  0(N~1).)  In  the  notation  of  exercises 
5.2.2-38  and  5.2.2-48,  we  have 


Cn  = Un  + N — 1 + 


Vat+i 
N + 1 


-aN- /?  + £(— 1)"2 


-n(n+l)/2^m>0 


zm>o 


■\-m\N 


rr=i(i-2-r) 


where 

a = 2/(1  • 1)  - 4/(3  • 3 • 1)  + 8/(7  • 7 ■ 3 • 1)  - 16/(15  • 15  • 7 • 3 • 1)  + • ■ • « 1.60670; 
p = 2/(1  • 3 • 1)  - 4/(3  ■ 7 ■ 3 • 1)  + 8/(7  • 15  • 7 • 3 • 1) « 0.60670. 

This  numerical  evaluation  suggests  that  a = P + 1,  a fact  that  is  not  hard  to  prove. 
Moreover,  a turns  out  to  be  identical  to  the  constant  defined  quite  differently  in  5.2.3- 
(19);  see  Karl  Dilcher,  Discrete  Math.  145  (1995),  83-93.  We  have  Vn+i/(N  + 1)  = 
Un+i  - UN,  and  the  value  of  £m>0(21_n)m(l  - 2~m)N  is  0(7V1-n),  by  exercise  5.2.2- 
46.  RenceCN  = UN+1-{a-l)N-a  + 0(N~1)  = (JV  + l)lg(iV  + l)+iV((7-l)/ln2  + 
J — a + 8-i(N))  + | — l/ln4  — a — |<5i(iV)  + 0(N~X),  by  exercise  5.2.2-50. 

The  variance  of  the  internal  path  length  of  a digital  search  tree  has  been  computed 
by  Kirschenhofer,  Prodinger,  and  Szpankowski,  SICOMP  23  (1994),  598-616. 

28.  The  derivations  in  the  text  and  exercise  27  apply  to  general  M > 2,  if  we  sub- 
stitute M for  2 in  the  obvious  places.  Hence  the  average  number  of  digit  inspections 
in  a random  successful  search  is  Cn /N  = Un+ 1 — cum  + 1 + 0(N~1)  — logM  N + 
(7  — l)/ln M + | — c*m  + 8-i(N)  + (logM  N)/N  + 0(N~1)-,  and  for  the  unsuccessful 
case  it  is  Cn+i  — C n — Vn+ 2 / (TV  + 2)  — »m  + 1 + 0(N  1)  = logM  N + 7/  In  M + | — 
olm  — 5o(N  + 1)  + 0(N~1).  Here  8s(n)  is  defined  in  exercise  19,  and 


aM  = ]T(-1)jMj+7(Mj+1  - 1 )2(Mj  - 1) ...  (M  - 1). 

i>  0 


29.  Flajolet  and  Sedgewick  [SICOMP  15  (1986),  748-767]  have  shown  that  the  approx- 
imate average  number  of  such  nodes  is  .372 JV  when  M — 2 and  .689 A'  when  M = 16. 
See  also  the  generalization  by  Flajolet  and  Richmond,  Random  Structures  & Algo- 
rithms 3 (1992),  305-320. 

30.  By  iterating  the  recurrence,  hn(z)  is  the  sum  of  all  possible  terms  of  the  form 


/ »\  Z Ip  1 \ 2 Z IPm\ 

Vpi  ) 2P1  - 1 \p2  ) 2P2  - 1 " ' 2p™  - 1 V 1 ) ’ 


for  n > pi  > • • • > pm  > 1. 


31.  h'n{  1)  = Vn]  see  exercise  5.2.2-36(b).  [For  the  variance  and  limiting  distributions 
of  M- ary  generalizations  of  Patrician  trees,  see  P.  Kirschenhofer  and  H.  Prodinger, 
Lecture  Notes  in  Comp.  Sci.  226  (1986),  177-185;  W.  Szpankowski,  JACM  37  (1990), 
691-711;  B.  Rais,  P.  Jacquet,  and  W.  Szpankowski,  SIAM  J.  Discrete  Math.  6 (1993), 
197-213.] 
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32.  The  sum  of  the  SKIP  fields  is  the  number  of  nodes  in  the  corresponding  binary 
trie,  so  the  answer  is  A at  (see  exercise  20). 

33.  Here’s  how  (18)  was  discovered:  A(2z)  — 2 A(z)  — e2z  - 2e*  + 1 + A(z)(ez  — 1) 

can  be  transformed  into  A(2z)/(e2z  - 1)  = (ez  - l)/(ez  + 1)  + A(z)/(ez  - 1).  Hence 
A(z)  = (ez  - l)Ei>i(e*/2i  - l)/(^/2'  + !)•  Now  if  f(z)  = Zcnzn,  ^ f(z/2j)  = 

Y,cnzn/(2n  - 1).  In  this  case  f(z)  = (ez  - l)/(e2  + 1)  = tanh(z/2),  which  equals 
1 - 2z  1(z/(e2  — 1)  — 2 z/(e2z  — 1))  = En>i  Bn+izn(2n+1  — 1 )/(n  + 1)!.  From  this 
formula  the  route  is  apparent. 

34.  (a)  Consider  E^iE^s ln_1  + - • ^(m-l)”-1  = (Bn{m)-Bn)/n 

by  exercise  1.2.11.2-4.  (b)  Let  Sn(m)  = E^iH1  ~ k/m)n  and  Tn(m)  = l/(en/m  - 1). 
If  k < m/2  we  have  e~kn/m  > exp(nln(l  - k/m))  > exp (-kn/m  - k2n/m2)  > 
e-fcn/m(1  _ k2/m2^  hence  (1  _ = c-*»/m  + 0(eTkn ' ™k2 n / m2) . Since  Sn(m)  = 

Erird  - k/m)n  + 0(2-")  and  Tn(m)  = YZil  e~kn/m  + 0(e~n/2),  we  have  Sn(m)  = 
Tn(m)+0(e  n,mn/m2).  The  sum  of  0(exp(-n/2j)n/22j)  is  0(n_1)>  because  the  sum 

for  j < lgn  is  of  order  n_1(l  + 2/e  + (2/e)2  H ) and  the  sum  for  j > lg  n is  of  order 

n J(1  + 1/4  + (1/4)2  + •■•).  (c)  Argue  as  in  Section  5.2.2  when  |x|  < 27r,  then  use 
analytic  continuation,  (d)  | lg(n/7r)  + 7/(2  In  2)  - | + S(n)  + 2/n,  where 

5(n)  = (2/ln2)  Efc>i^(C(—27rjfc/ln2)r(— 2nik/\n2)  exp(2nik\gn)) 

= (1/ln  2)  Efc>i  3?(C(1  + 27rifc/ln2)  exp(27rifc  lg(n/7r)))/ cosh(7r2fc/ln  2). 

The  variance  and  higher  moments  have  been  calculated  by  W.  Szpankowski,  JACM  37 
(1990),  691-711. 

35.  The  keys  must  be  {aOflOwi , a0/31u>2 , al,y0uj3 , atlq/160aj4,  al7l<5la)5},  where  a,  /3, . . . 
are  strings  of  Os  and  Is  with  |a|  = a - 1,  |/3|  = b - 1,  etc.  The  probability  that 
five  random  keys  have  this  form  is  5!  2a-1+i>_1+c_1+<i_1/2a+i’+0+!’+a+c+a+c+‘i+a+c+d  = 
5\/24a+b+2c+d+4 

36.  Let  there  be  n internal  nodes,  (a)  {n\/2I)  n(l/s(®))  = n!  n(l/2s(x)_1s(a:)),  where 
I is  the  internal  path  length  of  the  tree,  (b)  ((n  + 1)1/2”)  n(l/(2s(l)  - 1)).  (Consider 
summing  the  answer  of  exercise  35  over  all  a,  b,  c,  d>  1.) 

37.  The  smallest  modified  external  path  length  is  actually  2 — 1/2JV_2,  and  it  occurs 
only  in  a degenerate  tree  (whose  external  path  length  is  maximal).  [One  can  prove  that 
the  largest  modified  external  path  length  occurs  if  and  only  if  the  external  nodes  appear 
on  at  most  two  adjacent  levels!  But  it  is  not  always  true  that  a tree  whose  external 
path  length  is  smaller  than  another  has  a larger  modified  external  path  length.] 

38.  Consider  as  subproblems  the  finding  of  fc-node  trees  with  parameters 
...,(a,2fc-"/3). 

39.  See  Miyakawa,  Yuba,  Sugito,  and  Hoshi,  SICOMP  6 (1977),  201-234. 

40.  Let  N/r  be  the  true  period  length  of  the  sequence.  Form  a Patricia-like  tree,  with 
a0ai . . . as  the  TEXT  and  with  N/r  keys  starting  at  positions  0, 1, ... , N/r  — 1.  (No  key 
is  a prefix  of  another,  because  of  our  choice  of  r.)  Also  include  in  each  node  a SIZE 
field,  containing  the  number  of  tagged  link  fields  in  the  subtree  below  that  node.  To  do 
the  specified  operation,  use  Algorithm  P;  if  the  search  is  unsuccessful,  the  answer  is  0, 
but  if  it  is  successful  and  j < n the  answer  is  r.  Finally  if  it  is  successful  and  j > n, 
the  answer  is  r ■ SIZE(P) . 
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43.  The  expected  height  is  asymptotic  to  (1  + 1/s)  logM  N,  and  the  variance  is  0(1). 
See  H.  Mendelson,  IEEE  Transactions  SE-8  (1982),  611-619;  P.  Flajolet,  Acta  Infor- 
matica  20  (1983),  345-369;  L.  Devroye,  Acta  Informatica  21  (1984),  229-237;  B.  Pittel, 
Advances  in  Applied  Probability  18  (1986),  139-155;  W.  Szpankowski,  Algorithmic  a 
6 (1991),  256-277. 

The  average  height  of  a random  digital  search  tree  with  M — 2 is  asymptotically 
lg  n + \/2  lg n [Aldous  and  Shields,  Probability  Theory  and  Related  Fields  79  (1988), 
509-542],  and  the  same  is  true  for  a random  Patricia  tree  [Pittel  and  Rubin,  Journal 
of  Combinatorial  Theory  A55  (1990),  292-312], 

44.  See  SODA  8 (1997),  360-369;  this  search  structure  is  closely  related  to  the  multikey 
quicksort  algorithm  discussed  in  the  answer  to  exercise  5.2.2-30.  J.  Clement,  P.  Flajolet, 
and  B.  Vallee  have  shown  that  the  ternary  representation  makes  trie  searching  about 
three  times  faster  than  the  binary  representation  of  (2),  with  respect  to  nodes  accessed 
[see  SODA  9 (1998),  531-539], 

45.  The  probability  of  {THAT,  THE,  THIS}  before  {BUILT,  HOUSE,  IS,  JACK},  {HOUSE,  IS, 

JACK}  before  {BUILT},  {HOUSE,  IS}  before  {JACK},  {IS}  before  {HOUSE},  {THIS}  before 
{THAT,  THE},  and  {THE}  before  {THAT}  | = 


SECTION  6.4 

1.  —37  < rll  < 46.  Therefore  the  locations  preceding  and  following  TABLE  must  be 
guaranteed  to  contain  no  data  that  matches  any  given  argument;  for  example,  their 
first  byte  could  be  zero.  It  would  certainly  be  bad  to  store  K in  this  range!  [Thus  we 
might  say  that  the  method  in  exercise  6.3-4  uses  less  space,  since  the  boundaries  of 
that  table  are  never  exceeded.] 

2.  TOW.  [Can  the  reader  find  ten  common  words  of  at  most  5 letters  that  fill  all  the 
remaining  gaps  between  —10  and  30?] 

3.  The  alphabetic  codes  satisfy  A + T = I + N and  B — E = 0 — R,  so  we  would  have 
either  /(AT)  = /(IN)  or  /(BE)  = /(OR).  Notice  that  instructions  4 and  5 of  Table  1 
resolve  this  dilemma  rather  well,  while  keeping  rll  from  having  too  wide  a range. 

4.  Consider  cases  with  k pairs.  The  smallest  n such  that 


2~k  < 


for  m = 365, 


is  88.  If  you  invite  88  people  (including  yourself),  the  chance  of  a birthday  trio  is 
.511065,  but  if  only  87  people  come  it  is  lowered  to  .499455.  See  C.  F.  Pinzka,  AMM 
67  (1960),  830. 

5.  The  hash  function  is  bad  since  it  assumes  at  most  26  different  values,  and  some 
of  them  occur  much  more  often  than  the  others.  Even  with  double  hashing  (letting 
h2{K)  = 1 plus  the  second  byte  of  K , say,  and  M = 101)  the  search  will  be  slowed 
down  more  than  the  time  saved  by  faster  hashing.  Also  M = 100  is  too  small,  since 
FORTRAN  programs  often  have  more  than  100  distinct  variables. 

6.  Not  on  MIX,  since  arithmetic  overflow  will  almost  always  occur  (dividend  too  large). 
[It  would  be  nice  to  be  able  to  compute  (wK)  mod  M,  especially  if  linear  probing 
were  being  used  with  c = 1,  but  unfortunately  most  computers  disallow  this  since  the 
quotient  overflows.] 
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7.  If  R(x)  is  a multiple  of  P(x),  then  R(aj)  = 0 in  GF(2*)  for  all  j G S.  Let 
R(x)  = xai  + • • • + xa%  where  ai  > • • ■ > as  > 0 and  s < t,  and  select  t — s further 
values  as+i , . . . , at  such  that  ai, ...  ,at  are  distinct  nonnegative  integers  less  than  n. 
The  Vandermonde  matrix 


aQ1  . 

. . aat  \ 

a2ai  . 

. . a2ot 

atai  . 

. . a““  / 

is  singular,  since  the  sum  of  its  first  s columns  is  zero.  But  this  contradicts  the  fact 
that  aai , . . . ,aat  are  distinct  elements  of  GF(2*:).  (See  exercise  1.2.3-37.) 

[The  idea  of  polynomial  hashing  originated  with  M.  Hanan,  S.  Muroga,  F.  P. 
Palermo,  N.  Raver,  and  G.  Schay;  see  IBM  J.  Research  & Development  7 (1963), 
121-129;  U.S.  Patent  3311888  (1967).] 

8.  By  induction.  The  strong  induction  hypotheses  can  be  supplemented  by  the  fact 
that  { ( — l)fc  (z-qr/t  -t-  9a;_i)0}  = (~l)k(r(qk0  - pk)  + (qk-i6  - pk-i))  for  0 < r < ak.  The 
“record  low”  values  of  {n6}  occur  for  n = q% , q2  + qi , 2q2  +qi,  • ■ ■ , a2q2  +qi  = 0<j4  + q2 , 
94  + 93,  • • ■ , 0494  + g3  = O96  + 95,  . . . ; the  “record  high”  values  occur  for  n = q0, 
9i  + 9o,  ■ ■ ■ , oi9i  + go  = O93  + q2,  ...  . These  are  the  steps  when  interval  number  0 of  a 
new  length  is  formed.  [Further  structure  can  be  deduced  by  generalizing  the  Fibonacci 
number  system  of  exercise  1.2.8-34;  see  L.  H.  Ramshaw,  J.  Number  Theory  13  (1981), 
138-175.] 

9.  We  have  </>-1  = //l,  1,1,...//  and  </>~2  = //2, 1,1,...  //.  Let  0 = //01,  a2, ...  // 
and  9k  = //ak+i,ak+2, ...//,  and  let  Qk  = qk  + qk-\0k-2  in  the  notation  of  exercise  8. 
If  Ol  > 2,  the  very  first  break  is  bad.  The  three  sizes  of  intervals  in  exercise  8 are, 
respectively,  (1  — r6k-\)/Qk , 0k-\/Qk,  and  (l  — (r  — l)9k-i) /Qk,  so  the  ratio  of  the 
first  length  to  the  second  is  (ak  — r)  + 6k.  This  will  be  less  than  | when  r = ak  and 
ak+ 1 > 2;  hence  {02,03,...}  must  all  equal  1 if  there  are  to  be  no  bad  breaks.  [For 
related  theorems,  see  R.  L.  Graham  and  J.  H.  van  Lint,  Canadian  J.  Math.  20  (1968), 
1020-1024,  and  the  references  cited  there.] 

10.  See  F.  M.  Liang’s  elegant  proof  in  Discrete  Math.  28  (1979),  325-326. 

11.  There  would  be  a problem  if  K = 0.  If  keys  were  required  to  be  nonzero  as 
in  Program  L,  this  change  would  be  worthwhile,  and  we  could  also  represent  empty 
positions  by  0. 

12.  We  can  store  K in  KEYCO],  replacing  lines  14-19  by 


STA  TABLE (KEY) 
CMPA  TABLE, 2 (KEY) 
JE  3F 
2H  ENT1  0,2 

LD2  TABLE, 1 (LINK) 


A -SI 
A -SI 
A -SI 
C - 1 - S2 
C - 1 - S2 


CMPA  TABLE, 2 (KEY) 
JNE  2B 
3H  J2Z  5F 
ENT1  0,2 
JMP  SUCCESS 


C - 1 - S2 
C - 1 - S2 
A -SI 
S2 

S2  | 


The  time  “saved”  isC— 1 — 5A+S+4S1  units,  which  is  actually  a net  loss  because 
C is  rarely  more  than  5.  (An  inner  loop  shouldn’t  always  be  optimized!) 

13.  Let  the  table  entries  be  of  two  distinguishable  types,  as  in  Algorithm  C,  with  an 
additional  one-bit  TAG  [i]  field  in  each  entry.  This  solution  uses  circular  lists,  following 
a suggestion  of  Allen  Newell,  with  TAG[i]  = 1 in  the  first  word  of  each  list. 
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Al.  [Initialize.]  Set  i 4-  j 4—  h(K)  + 1,  Q 4-  q{K). 

A2.  [Is  there  a list?]  If  TABLE  [i]  is  empty,  set  TAG  [j]  4—  1 and  go  to  A8.  Otherwise 
if  TAG[i]  = 0,  go  to  A7. 

A3.  [Compare.]  If  Q = KEY  [i]  , the  algorithm  terminates  successfully. 

A4.  [Advance  to  next.]  If  LINK  [i]  / j,  set  i 4—  LINK  [j]  and  go  back  to  A3. 

A5.  [Find  empty  node.]  Decrease  R one  or  more  times  until  finding  a value  such 
that  TABLE  [R]  is  empty.  If  R = 0,  the  algorithm  terminates  with  overflow; 
otherwise  set  LINK  [*]  4—  R. 

A6.  [Prepare  to  insert.]  Set  i 4-  R,  TAG  [7?]  4—  0,  and  go  to  A8. 

A7.  [Displace  a record.]  Repeatedly  set  i 4—  LINK  [i]  one  or  more  times  until 
LINK  [*]  = j.  Then  do  step  A5.  Then  set  TABLE  [R]  4—  TABLE [i],  i 4—  j , 
TAG  [j]  4-  1. 

A8.  [Insert  new  key.]  Mark  TABLED]  as  an  occupied  node,  with  KEY [z]  4—  Q, 
LINK  [i]  4—  j.  | 

(Note  that  if  TABLE  [i]  is  occupied  it  is  possible  to  determine  the  corresponding  full 
key  K,  given  only  the  value  of  i.  We  have  q(K)  = KEY  [j]  , and  then  if  we  set  i 4-  LINK  [i] 
zero  or  more  times  until  TAG  [i]  = 1 we  will  have  h(K)  — i — 1.) 

14.  According  to  the  stated  conventions,  the  notation  “X  <=  AVAIL”  of  2.2.3-(6)  now 
stands  for  the  following  operations:  “Set  X 4-  AVAIL;  then  set  X 4-  LINK(X)  zero 
or  more  times  until  either  X = A (an  OVERFLOW  error)  or  TAG(X)  = 0;  finally  set 
AVAIL  t—  LINK (X).” 

To  insert  a new  key  K:  Set  Q <=  AVAIL,  TAG(Q)  t-  1,  and  store  K in  this  word. 
[Alternatively,  if  all  keys  are  short,  omit  this  and  substitute  K for  Q in  what  follows.] 
Then  set  R <=  AVAIL,  TAG(R)  1,  AUX(R)  <-  Q,  LINK(R)  <-  A.  Set  P h{K),  and 

if  TAG(P)  = 0,  set  TAG(P)  2,  AUX(P)  <-  R; 

if  TAG(P)  = 1,  set  S <t=  AVAIL,  CONTENTS(S)  -t-  CONTENTS(P),  TAG(P)  <-  2, 
AUX(P)  4-  R,  LINK(P)  4-  S; 

if  TAG(P)  = 2,  set  LINK(R)  4-  AUX(P),  AUX(P)  4-  R. 

To  retrieve  a key  K:  Set  P 4—  h(K),  and 
if  TAG(P)  7^  2,  K is  not  present; 

if  TAG(P)  = 2,  set  P 4—  AUX(P);  then  set  P 4—  LINK(P)  zero  or  more  times  until 
either  P = A,  or  TAG(P)  = 1 and  either  AUX(P)  = K (if  all  keys  are  short) 
or  AUX(P)  points  to  a word  containing  K (perhaps  indirectly  through  words 
with  TAG  = 2). 

Elcock’s  original  scheme  [Comp.  J.  8 (1965),  242-243]  actually  used  TAG  = 2 and 
TAG  = 3 to  distinguish  between  lists  of  length  one  (when  we  can  save  one  word  of 
space)  and  longer  lists.  This  is  a worthwhile  improvement,  since  we  presumably  have 
such  a large  hash  table  that  almost  all  lists  have  length  one. 

Another  way  to  place  a hash  table  “on  top  of”  a large  linked  memory,  using 
coalescing  lists  instead  of  separate  chaining,  has  been  suggested  by  J.  S.  Vitter  [Inf. 
Proc.  Letters  13  (1981),  77-79]. 

15.  Knowing  that  there  is  always  an  empty  node  makes  the  inner  search  loop  faster, 
since  we  need  not  maintain  a counter  to  determine  how  many  times  step  L2  is  per- 
formed. The  shorter  program  amply  compensates  for  this  one  wasted  cell.  [On  the 
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other  hand,  there  is  a neat  way  to  avoid  the  variable  N and  to  allow  the  table  to 
become  completely  full,  in  Algorithm  L,  without  slowing  down  the  method  appreciably 
except  when  the  table  actually  does  overflow:  Simply  check  whether  i < 0 happens 
twice!  This  trick  does  not  apply  to  Algorithm  D.] 

16.  No:  0 always  leads  to  SUCCESS,  whether  it  has  been  inserted  or  not,  and  SUCCESS 
occurs  with  different  values  of  i at  different  times. 

17.  The  second  probe  would  then  always  be  to  position  0. 

18.  The  code  in  (31)  costs  about  3(A  — SI)  units  more  than  (30),  and  it  saves  4 u 
times  the  difference  between  (26),  (27),  and  (28),  (29).  For  a successful  search,  (31) 
is  advantageous  only  when  the  table  is  more  than  about  94  percent  full,  and  it  never 
saves  more  than  about  |m  of  time.  For  an  unsuccessful  search,  (31)  is  advantageous 
when  the  table  is  more  than  about  71  percent  full. 

20.  We  want  to  show  that 

(2)  ^ (2)  (modul°  2m)  and  1 < J < k < 2m 

implies  j = k.  Observe  that  the  congruence  j ( j - 1)  = k(k  - 1)  (modulo  2m+1)  implies 
(k  - j)(k  + j — 1)  = 0.  If  k — j is  odd,  k + j - 1 must  be  a multiple  of  2m+1,  but  that’s 
impossible  since  2 < k + j — 1 < 2m+1  — 2.  Hence  k—j  is  even,  so  k + j — 1 is  odd,  so 
k—j  is  a multiple  of  2m+1,  so  k = j.  [Conversely,  if  M is  not  a power  of  2,  this  probe 
sequence  does  not  work.] 

The  probe  sequence  has  secondary  clustering,  and  it  increases  the  running  time  of 
Program  D (as  modified  in  (3°))  by  about  |(C— 1)  — (A— 51)  units  since  B « (C^'1)/M 
will  now  be  negligible.  This  is  a small  improvement,  until  the  table  gets  about  60 
percent  full. 

21.  If  N is  decreased,  Algorithm  D can  fail  since  it  might  reach  a state  with  no  empty 
spaces  and  loop  indefinitely.  On  the  other  hand,  if  N isn’t  decreased,  Algorithm  D 
might  signal  overflow  when  there  still  is  room.  The  latter  alternative  is  the  lesser  of  the 
two  evils,  because  rehashing  can  be  used  to  get  rid  of  deleted  cells.  (In  the  latter  case 
Algorithm  D should  increase  N and  test  for  overflow  only  when  inserting  an  item  into 
a previously  empty  position,  since  N represents  the  number  of  nonempty  positions.) 
We  could  also  maintain  two  counters. 

22.  Suppose  that  positions  j — 1,  j — 2,  . . . , j — k are  occupied  and  j — A;  — 1 is  empty 
(modulo  M).  The  keys  that  probe  position  j and  find  it  occupied  before  being  inserted 
are  precisely  those  keys  in  positions  j — 1 through  j — k whose  hash  address  does  not 
lie  between  j - 1 and  j - k]  such  problematical  keys  appear  in  the  order  of  insertion. 
Algorithm  R moves  the  first  such  key  into  position  j , and  repeats  the  process  on  a 
smaller  range  of  problematical  positions  until  no  problematical  keys  remain. 

23.  A deletion  scheme  for  coalesced  chaining  devised  by  J.  S.  Vitter  [J.  Algorithms  3 
(1982),  261-275]  preserves  the  distribution  of  search  times. 

24^  We  have  P(P  - 1)(P  - 2 )P{P  - 1 )P(P  - 1 )/(MP(MP  - 1 )...(MP  - 6))  = 
M 7(l  ~ (5  — 21/M)P-1  + 0(P~2)).  In  general,  the  probability  of  a hash  sequence 
01  • • ■ aN  is  (rij=o  P~: )/ ( MP )—  = M~N  + 0(P~1),  where  bj  is  the  number  of  a,  that 
equal  j. 

25.  Let  the  (IV  + l)st  key  hash  to  location  a;  1\-  is  M v times  the  number  of  hash 
sequences  that  leave  the  k locations  a,  a — 1,  . . . , a — k -(-  1 (modulo  A7 ) occupied  and 
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a — k empty.  The  number  of  such  sequences  with  a + 1,  . . . , a + t occupied  and  a + t+l 
empty  is  g(M,N,t+k),  by  circular  symmetry  of  the  algorithm. 

28-  2!2M!l!/(3,2)/(3’2)/(5’4)/(2’1)  = 2235547  = 4252500‘ 

27.  Following  the  hint, 

s(n,  x,y)  = ^2  (^x(x+k)k  (y-k)n~k~1  (y-n)+n  (x+k)k  (y-k)n-k~1  (y-n). 

k k 

In  the  first  sum,  replace  k by  n — k and  apply  Abel’s  formula;  in  the  second,  replace  k 
by  k + 1.  Now 

g(M,N,k)  = + 1 )k~\M  -k-  1 )N~k~\M  - N - 1), 

with  0/0  = 1 when  k — N = M — 1,  and 

Mn  £(*  + l)Pk  = ( k t 2 ) 9 (M,  N,  k) 

k> 0 V ' 

= l ( X>  + IV,  *)  + £(*  + lfg(M,  N,  k) 

\fc> 0 fc>0 

The  first  sum  is  MN  Y^Pk  — MN , and  the  second  is  s(N,  l,M  — l)  = MN  +2NMn~*  + 
3N(N  — 1 )Mn~2  + •••  = MnQi(M,  N).  [See  J.  Riordan,  Combinatorial  Identities 
(New  York:  Wiley,  1968),  18-23,  for  further  study  of  sums  like  s(n,x,  y).] 

28.  Let  t(n,x,y)  = ^2k>0  {k)(x  + k)k+2(y  — k)n^k~1(y  — n);  then  as  in  exercise  27  we 
find  t(n,x,y)  = xs(n,x,y)  + nf(n-l,  z+l,  y-l),  t(N,  1,  M- 1)  = MN (3Q3(M,N)  - 
2 Q2(M,N)).  ThusY2(k+l)2Pk  = M-JVE(5(fc+l)3+|(fc+l)2+|(fc+l))ff(M,iV,ifc)  = 
Qs(M,  N)  — lQ2(M,  N)  + |Qi(M,  N)  + 1 . Subtracting  (C'N)2  gives  the  variance,  which 
is  approximately  | (1  — a)-4  — |(1  — a)-3  — yj-  The  standard  deviation  is  often  larger 
than  the  mean;  for  example,  when  a = .9  the  mean  is  50.5  and  the  standard  deviation 
is  |\/27333  « 82.7. 

29.  Let  M = m+1,  N = n;  the  safe  parking  sequences  are  precisely  those  in  which  loca- 
tion 0 is  empty  when  Algorithm  L is  applied  to  the  hash  sequence  (M-a i) . . . ( M—a„ ). 
Hence  the  answer  is  /(m+1,  n)  = (m  + l)n  - n(m  + l)"-1.  [This  problem  originated 
with  A.  G.  Konheim  and  B.  Weiss,  SIAM  J.  Applied  Math.  14  (1966),  1266-1274;  see 
also  R.  Pyke,  Annals  of  Math.  Stat.  30  (1959),  568-576,  Lemma  1.] 

30.  Obviously  if  the  cars  get  parked  they  define  such  a permutation.  Conversely,  if 
P1P2  ■ ■ -Pn  exists,  let  qiq2  . . . qn  be  the  inverse  permutation  (qt  = j if  and  only  if  pj  = i), 
and  let  bi  be  the  number  of  a,j  that  equal  i.  Every  car  will  be  parked  if  we  can  prove 
that  bn  < 1,  bn-i  + bn  < 2,  etc.;  equivalently  61  > 1,  bi  +62  > 2,  etc.  But  this  is  clearly 
true,  since  the  k elements  aqi , . . . , a,lk  are  all  < k. 

[Let  r j be  the  “left  influence”  of  q3,  namely  r,  = k if  and  only  if  q3-i  < q2,  . . . , 
qj-k-i  < qj  and  either  j = k or  qj-k  > qj.  Of  all  permutations  p\. . . p„  that  dominate 
a given  wakeup  sequence  a\. . .an,  the  “park  immediately”  algorithm  finds  the  smallest 
one  (in  lexicographic  order).  Konheim  and  Weiss  observed  that  the  number  of  wakeup 
sequences  leading  to  a given  permutation  p\...pn  is  I"["=i  ri\  is  remarkable  that  the 
sum  of  these  products,  taken  over  all  permutations  qy. . . qn,  is  (n  + l)n_1.) 

31.  Many  interesting  connections  are  possible,  and  the  following  three  are  the  author’s 
favorites  [see  also  Foata  and  Riordan,  JEquat.  Math.  10  (1974),  10-22]: 
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a)  In  the  notation  of  the  previous  answer,  the  counts  bi,  62,  • ■ • , bn  correspond  to  a 
full  parking  sequence  if  and  only  if  (61, 62,  • • • , b„,  0)  is  a valid  sequence  of  degrees  of  tree 
nodes  in  preorder.  (Compare  with  2.3.3-(g),  which  illustrates  postorder.)  Every  such 
tree  corresponds  to  n\/b\\ . . . bn ! distinct  labeled  free  trees  on  {0, . . . , n},  since  we  can  let 
0 be  the  label  of  the  root,  and  for  k = 1,  2,  . . . , n we  can  successively  choose  the  labels 
of  the  children  of  the  kth  node  in  preorder  in  (bk  + ■ ■ ■ + bn)\/bk\  (6fc+i  + • • ■ + bn)\  ways 
from  the  remaining  unused  labels,  attaching  labels  from  left  to  right  in  increasing  order. 
And  every  such  sequence  of  counts  corresponds  to  n\/b\  \ . . . bnl  wakeup  sequences. 

b)  Dominique  Foata  has  given  the  following  pretty  one-to-one  correspondence:  Let 
a\. . . an  be  a safe  parking  sequence,  which  leaves  car  qj  parked  in  space  j.  A labeled 
free  tree  on  (0, 1, ... , n}  is  constructed  by  drawing  a line  from  j to  0 when  a3  = 1,  and 
from  j to  qaj- 1 otherwise,  for  1 < j < n.  (Think  of  the  tree  nodes  as  cars;  car  j is 
connected  to  the  car  that  eventually  winds  up  parked  just  before  where  wife  j woke 
up.)  For  example,  the  wakeup  times  314159265  lead  to  the  free  tree 


0 


2 7 

9 6 


4 1 

5 8 


by  Foata’s  rule.  Conversely,  the  sequence  of  parked  cars  may  be  obtained  from  the  tree 
by  topological  sorting,  assuming  that  arrows  emanate  from  the  root  0 and  choosing  the 
smallest  “source”  at  each  step.  From  this  sequence,  ai . . . an  can  be  reconstructed. 

c)  First  construct  an  auxiliary  tree  by  letting  the  parent  of  node  k be  the  first 
element  > k that  follows  k in  the  permutation  q\. . . qn\  if  there’s  no  such  element, 
let  the  parent  be  0.  Then  make  a copy  of  the  auxiliary  tree  and  relabel  the  nonzero 
nodes  of  the  new  tree  by  proceeding  as  follows,  in  preorder:  If  the  label  of  the  current 
node  was  k in  the  auxiliary  tree,  swap  its  current  label  with  the  label  that  is  currently 
(1  + Pk  — a&)th  smallest  in  its  subtree.  For  example, 


auxiliary  tree 


6 


4 2 


final  tree 

6 

/ 3 5 

0<w_<^ 

4 2 8 l\  . 

9 7 


To  reverse  the  procedure,  we  can  reconstruct  the  auxiliary  tree  by  proceeding  in 
preorder  to  swap  the  label  of  each  node  with  the  largest  label  currently  in  its  subtree. 

Constructions  (a)  and  (b)  are  strongly  related,  but  construction  (c)  is  quite  dif- 
ferent. It  has  the  interesting  property  that  the  sum  of  displacements  of  cars  from  their 
preferred  locations  is  equal  to  the  number  of  inversions  in  the  tree  — the  number  of  pairs 
of  labels  a > b where  a is  an  ancestor  of  b.  This  relation  between  parking  sequences 
and  tree  inversions  was  first  discovered  by  G.  Kreweras  [Periodica  Math.  Hung.  11 
(1980),  309-320],  The  fact  that  tree  inversions  are  intimately  related  to  connected 
graphs  [Mallows  and  Riordan,  Bull.  Amer.  Math.  Soc.  74  (1968),  92-94]  now  makes 
it  possible  to  deduce  that  the  sum  of  (D^p))  taken  over  all  parking  sequences,  where 
D(p)  = (pi  — 01)  + • • • + (pn  — an),  is  equal  to  the  total  number  of  connected  graphs 
with  n + k edges  on  the  labeled  vertices  {0, 1, ... , n}.  [See  equations  (2.11),  (3.5),  and 
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(8.13)  in  the  paper  by  Janson,  Luczak,  Knuth,  and  Pittel,  Random  Struct.  <fc  Alg.  4 
(1993),  233-358.] 

32.  Let 

3 

&j  — ^ ' {pk  mod  M 1)* 
k=0 

Then,  as  observed  by  Svante  Janson,  we  have  cj  — ma xk>j(sk  — Sj),  a quantity  that  is 
well  defined  because  lim*-,,^  sk  = — oo. 

The  solution  can  be  found  by  defining  cm- i,  cm- 2,  ...  on  the  assumption  that 
Co  = 0;  then  if  Co  turns  out  to  be  greater  than  0,  it  suffices  to  redefine  cm-  1,  cm- 2, 

. . . until  no  more  changes  are  made. 

33.  The  individual  probabilities  are  not  independent,  since  the  condition  b0  + bi  H h 

fiitf- 1 = N was  not  taken  into  account;  the  derivation  allows  a nonzero  probability  that 
£ bj  has  any  given  nonnegative  value.  Equations  (46)  are  not  strictly  correct;  they 
imply,  for  example,  that  qk  is  positive  for  all  k,  contradicting  the  fact  that  Cj  can  never 
exceed  TV  — 1. 

Gaston  Gonnet  and  Ian  Munro  [J.  Algorithms  5 (1984),  451-470]  have  found  an 
interesting  way  to  derive  the  exact  result  from  the  argument  leading  up  to  (51)  by 
introducing  a useful  operation  called  the  Poisson  transform  of  a sequence  (Am„):  We 
have  e~mz  £n  Amn(mz)n/n\  = £fc  a*zk  and  only  Amn  = £*  akn-/mk . 

34.  (a)  There  are  (^')  ways  to  choose  the  set  of  j such  that  a}  has  a particular  value, 
and  (M  — l)‘v~fc  ways  to  assign  values  to  the  other  a’s.  Therefore 

PNk  = -\)N-k/MN . 

(b)  Pn(z)  = B(z ) in  (50).  (c)  Consider  the  total  number  of  probes  to  find  all  keys,  not 
counting  the  fetching  of  the  pointer  in  the  list  head  table  of  Fig.  38  if  such  a table  is 
used.  A list  of  length  k contributes  (fc^1)  to  the  total;  hence 

CN  = Mj2(k  + 1yNk/N  = (M/N)(±PZ(l)  + P'N(  1)). 

(d)  In  case  (i)  a list  of  length  k requires  k probes  (not  counting  the  list-head  fetch),  while 
in  case  (ii)  it  requires  k + Sko  ■ Thus  in  case  (ii)  we  get  C'N  = J2(k  + 6k0)PNk=P'N(  1)  + 
Pn(0)  = N/M  + (1  — 1/M)n  « a + e a,  while  case  (i)  has  simply  C'N  — N/M  = a. 
The  formula  MC'N  = M — N + NCn  applies  in  case  (iii),  since  M — N hash  addresses 
will  discover  an  empty  table  position  while  N will  cause  searching  to  the  end  of  some 
list  from  a point  within  it;  this  yields  (18). 

35.  (i)  £(1  + ±k-(k  + l)-1^™  = 1 + N/(2M)  - M(  1 - (1  - 1 /M)n+1)/{N  + 1)  « 

1 + — (1  — e a)/a.  (ii)  Add  = (1  — 1/M)N  « e~a  to  the  result  of  (i). 

(iii)  Assume  that  when  an  unsuccessful  search  begins  at  the  jth  element  of  a list  of 
length  k,  the  given  key  has  random  order  with  respect  to  the  other  k elements,  so 

the  expected  length  of  search  is  (j  ■ 1 + 2 -I f (k  + 1 - j)  + (k  + 1 - j))/(k  + 1). 

Summing  on  j now  gives  MC'N  = M - N + M JZ(A.'3  + 9 k2  + 2k)PNk/(6k  + 6)  = 
M-N  + M(^N(N  - 1 )/M2  + | N/M  - 1 + (M/(N  + l))(l  - (1  - 1 /M)N+1));  hence 
C'n  ~ 5Q  + | a 2 + (1  — e a)/a. 

36.  (i)  N/M  - N/M2.  (ii)  J2(Sk 0 + k)2PNk  = £(<5*o  + k2)PNk  = PN( 0)  + P&(  1)  + 
P'n ( 1 ) • Subtracting  (C'N)2  gives  the  answer,  (M  - 1)N/M2  + (1  - 1/M)N (l  - 2N/M  - 
(1  — 1 /M)n)  f « a + e a(l  — 2a  — e-a)  < 1 — e_1  — e-2  = 0.4968.  [For  data  structure 
(iii),  a more  complicated  analysis  like  that  in  exercise  37  would  be  necessary.] 
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37.  Let  Sn  be  the  average  value  of  ( C - l)2,  considering  all  MN N choices  of  hash 
sequences  and  keys  to  be  equally  likely.  Then 

MnNSn  = ^ E N kS)  (kl(kl  ~ ^ kl  ~ b kM(kM  - \){kM  - 1)) 

k 

= t^MN(N  - 1 )(N  - 2)  J2(Nk  r33)(M  - VN~k 

k 

+ ±MN(N  - 1)  E(  fc  la)  (M  - i)""* 

k 

= \mN(N  - 1)(N  - 2 )Mn~3  + \mN(N  - 1 )Mn~2. 

o Z 

The  variance  is  SN  - (( N - l)/2 M)2  = (N  - 1 )(JV  + 6 M - 5)/12 M2  « a 2. 

See  CMath  §8.5  for  interesting  connections  between  the  total  variance  calculated 
here  and  two  other  notions  of  variance:  the  variance  (over  random  hash  tables)  of  the 
average  number  of  probes  (over  all  items  present),  and  the  average  (over  random  hash 
tables)  of  the  variance  of  the  number  of  probes  (over  all  items  present).  The  total 
variance  is  always  the  sum  of  the  other  two;  and  in  this  case  the  variance  of  the  average 
number  of  probes  is  (M  — 1 )(N  - l)/(2 M2N). 

38.  The  average  number  of  probes  is  PNk  (2Hk+1  - 2 + Sk0)  in  the  unsuccessful 
case,  ( M/N ) X]  -P/vfcfc(2(l  + 1 /k)Hk  — 3)  in  the  successful  case,  by  Eqs.  6.2.2-(5)  and 
6.2.2-(6).  These  sums  are  2 f(N)  + 2M(l  — (1  - 1/M)n+1)/(N  + 1)  + (1  - 1 /M)N  - 2 
and  2 (M/N)f{N)  + 2 f(N  — 1)  + 2M(l  — (1  — 1 /M)n)/N  — 3,  respectively,  where 
f(N)  = X3  PNkHk ■ Exercise  5.2.1-40  tells  us  that  f(N)  = lna  + 7 + E\{a)  + 0(M~1) 
when  N = aM,  M — >•  00. 

[Tree  hashing  was  first  proposed  by  P.  F.  Windley,  Comp.  J.  3 (1960),  84-88.  The 
analysis  in  the  previous  paragraph  shows  that  tree  hashing  is  not  enough  better  than 
simple  chaining  to  justify  the  extra  link  fields;  the  lists  are  short  anyway.  Moreover, 
when  M is  small,  tree  hashing  is  not  enough  better  than  pure  tree  search  to  justify  the 
hashing  time.] 

39.  (This  approach  to  the  analysis  of  Algorithm  C was  suggested  by  J.  S.  Vitter.) 
We  have  cjv+i(fc)  = (M  — fc)cjv(fc)  + (k  — 1 )cjv(fc  — 1)  for  k > 2,  and  furthermore 
X]fcc^(fc)  = NMn  . Hence 

Sn+i  = ^(2 )cN+i(*0  = E(2)((M  - k)cN{k)  + (k-  1 )cN{k  - 1)) 

k>2  k ^ 2 

= E((M  + 2)(2)+*)c"W  = (M  + 2)Sn  + NMn. 

Consequently  Sn  = (N  - 1 )MN~1  + (N  - 2 )MN~2(M  + 2)  H 1-  M(M  + 2)JV~2  = 

\ (M(M  + 2)n  - Mn+1  - 2 NMn). 

Consider  the  total  number  of  probes  in  unsuccessful  searches,  summed  over  all 
M values  of  h(K)\  each  list  of  length  k contributes  k + Sko  + (£)  to  the  total,  hence 
MN+1C'N  = MN+1  + SN. 
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40.  Define  Un  to  be  like  Sn  in  exercise  39,  but  with  (£)  replaced  by  (fc^1) . We  find 
UN+ 1 = (M  + 3 )UN  + SN  + NMn,  hence 

UN  = ±{Mn(M  - 6 N)  - 9 M(M  + 2)n  + 8 M(M  + 3)^). 

The  variance  is  2Un/Mn+1  + C'N  — (C'N)2,  which  approaches 

JA  _ i _ 12  , ,1  _ 5 \ 2a  , 4 3a  _ J_  4a 
144  12“  4U  8/e  ' 9 e 16 e 

for  TV  = aM,  M — ¥ oo.  When  a = 1 this  is  about  2.65,  so  the  standard  deviation  is 
bounded  by  1.63.  [Svante  Janson  (to  appear)  has  found  the  asymptotic  moments  of  all 
orders,  also  when  the  search  is  successful.] 

41.  Let  Vjv  be  the  average  length  of  the  block  of  occupied  cells  at  the  “high”  end  of 
the  table.  The  probability  that  this  block  has  length  k is  Anis{M  — 1 — k)^^/MN, 
where  A^k  is  the  number  of  hash  sequences  (35)  such  that  Algorithm  C leaves  the  first 
N — k and  the  last  k cells  occupied  and  such  that  the  subsequence  12...  N—k  appears 
in  increasing  order.  Therefore 

MnVn  = Zk  kANk(M  - 1 - = Mn+1  - J2k(M  - k)ANk(M  - 1 - fc)— 

= MN+1  - ( M-N ) ANk(M  - fc)—  = MN+1  - (M  - N){M  + 1)N. 

Now  Tjv  = (N/M)(1+Vn  — To Tjv-i),  since  ToH hTv-i  is  the  average  number 

of  times  R has  previously  decreased  and  N/M  is  the  probability  that  it  decreases  on 
the  current  step.  The  solution  to  this  recurrence  is  Tn  = (N/M)(  1 + 1 /M)N . (Such  a 
simple  formula  deserves  a simpler  proof!) 

42.  SIn  is  the  number  of  items  that  were  inserted  with  A = 0,  divided  by  N. 

43.  Let  N = aM'  and  M = /3M',  and  let  e-A  + A — 1//3,  p = a//3.  Then  Cjv  ~ 1 + \p 
and  C’N  « p+e~p,  if  p < A;  CN  » ^(e2p-2A-l-2p+2A)(3-2//3+2A)+i(p+2A-A2/p) 
and  C'N  m 1//3  + |(e2e_2A  — 1)(3  — 2//3  + 2A)  — \ {p  — A),  if  p > A.  For  a=lwe  get  the 
smallest  Cn  ~ 1.69  when  fi  « .853;  the  smallest  CN  « 1.79  occurs  when  /3  ss  .782.  The 
setting  P = .86  gives  near-optimal  search  performance  for  a wide  range  of  a.  So  it  pays 
to  put  the  first  collisions  into  an  area  that  doesn’t  conflict  with  hash  addresses,  even 
though  a smaller  range  of  hash  addresses  will  cause  more  collisions  to  occur.  These 
results  are  due  to  Jeffrey  S.  Vitter,  JACM  30  (1983),  231-258. 

44.  (The  following  brute-force  approach  was  the  solution  found  by  the  author  in  1972; 
a much  more  elegant  solution  by  M.  S.  Paterson  is  explained  in  Mathematics  for  the 
Analysis  of  Algorithms  by  Greene  and  Knuth  (Birkhauser  Boston,  1980),  §3.4.  Paterson 
also  found  significant  ways  to  simplify  several  other  analyses  in  this  section.) 

Number  the  positions  of  the  array  from  1 to  m,  left  to  right.  Considering  the  set  of 
all  (2)  sequences  of  operations  with  k “p  steps”  and  n — k “q  steps”  to  be  equally  likely, 
let  g(m,n+l,k,r)  be  (£)  times  the  probability  that  the  first  r — 1 positions  become 
occupied  and  the  rth  remains  empty.  Thus  g(m,l,k,r)  is  (m  — i)-0-i-fc)  times  the 
sum,  over  all  configurations 

1 < ai  < ■ ■ ■ < ak  < l,  (ci, . . . ,ci_i_fc),  2 < a < m, 

of  the  probability  that  the  first  empty  location  is  r,  when  the  a^  th  operation  is  a p step 
and  the  remaining  l — 1 — k operations  are  q steps  that  begin  by  selecting  positions 
Ci,  . . . , ci—i  — k,  respectively.  By  summing  over  all  configurations  subject  to  the  further 
condition  that  the  a3  th  operation  occupies  position  bj,  given  1 < bi  < ■ ■ ■ < bk  < r,  we 
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obtain  the  recurrence 


g(m,l,k+l,r) 


E 

a<l 

b<r 

l<b<a 


(l  — b — 1)!  (m  — r)! 
(I  — r)!  (m  — 6)! 


(m  — l + 1 )g(m,  a,  k,  6); 


g(m,l,0,r) 


(i l — 1)!  (m  — r)! 
(I  — r)!  m! 


(m  — l + 1) 


where  Pi  = (m/(m  — l))1  \ Letting  G(m,l,k) 


that 


G(m,  l,  fc+1) 


m — l + 1 
m — l + 2 


i-i 

^G(m,a,fc); 

0=1 


^ ^ir)i  it  follows 

G(m,l,0)  = — — lT^(rn  + Pi), 
in  — l ~r  Z 


The  answer  to  the  stated  problem  is  m — J2k=o  Pk<ln  kG(m,  n+l,k),  which  (after  some 
maneuvering)  equals  m — ((m  — n)/(m  — n 4-  1))(Q„  + mR  + pSR),  where 

Qj  = 

R=( PV..fi 5— UttV  1 P— ), 

V 771+1/V  m)  \ m — n + 2/  m + 1 — j J 

3=0 

(i--£-r)  fi — E-Wi- jl)  * 

V m + 1/  V.  m+l/\  m) 

_ y'  (l  — 1/ (m  + 1 — k))Qk 
feoIIi=o  (i-pAm+i-j)) 

When  p = 1/m,  Qj  = 1 for  all  j.  Letting  re  = m + 1,  n = aw,  w ->  oo,  we  find 
lnfi  = —(Hw  — Hw(i~a))p  + 0(p2);  hence  R — 1 + w-1ln(l  — a)  + 0(w~2);  and 
similarly  S = aw  + 0(1).  Thus  the  answer  is  (1  — a)-1  — 1 — a — ln(l  — a)  + 0(w-1). 

Notes:  The  simpler  problem  “with  probability  p occupy  the  leftmost,  otherwise 
occupy  any  randomly  chosen  empty  position”  is  solved  by  taking  Pj  = 1 in  the  formulas 
above,  and  the  answer  is  m — (m  + l)(m  — n)R/(m  — n + 1).  To  get  C’N  for  random 
probing  with  secondary  clustering,  set  n = N,  m — M and  add  1 to  the  answer  above. 

45.  Yes.  See  L.  Guibas,  JACM  25  (1978),  544-555. 

46.  Define  the  numbers  [[£]]  for  k > 0 by  the  rule 

?(*:*)[[:]] 

for  all  x and  all  nonnegative  integers  n.  Setting  x = —1,  —2, . . . , — n — 1 implies  that 

[[fc]]  =E(Jj)(-1)i(n_^n  for  0 < fc  < n; 
j J 

then  setting  x = 0 implies  that  we  may  take  [[£]]  =0  for  all  k > n,  so  the  two  sides  of 
the  defining  equation  are  polynomials  in  x of  degree  n that  agree  on  n + 1 points.  It 
follows  that  the  numbers  [[£]]  have  the  stated  property. 
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Let  f(N,r)  be  the  number  of  hash  sequences  a\. . . aN  that  leave  the  first  r 
locations  occupied  and  the  next  one  empty.  There  are  possible  patterns  of 

occupied  cells,  and  each  pattern  occurs  as  many  times  as  there  are  sequences  aj  . . . a'N , 
1 < a'  < TV,  that  contain  each  of  the  numbers  r + 1,  r + 2,  . . . , N at  least  once.  By 
inclusion-exclusion,  there  are  [[^J]  such  sequences;  hence 


f(N,r)  = 


/ M - r - 1\  " 

■ N r 

V N-r  ) . 

.N-r  J. 

Now 


C\ 


"N  = l + M-N~1'£f(N,r)  $>+  £ 


r= 0 
N 


-1  M-l 


a= 0 a=r+ 1 


N-r 
M — r — 1 


(»•+!) 


1 + M-"-1  ^2  f(N,  r)(N  + (N-  l)r). 


Let  Sn(x)  = J2k  fcCCfc^fc) [[fc ]]i  we  have 


) + E 

k 


x + 1 + k 
k 


hence  Sn{x)  = (x+  l)((x  + n + 2)n  - (s  + n+  l)n).  It  follows  that  C'N  - IV(1  + 1/M)- 
(N  — 1)(1  — N/M)(l  + 1/M)n  ps  7V(l  — (1  — cc)e“);  and  CN  = (N  - 1)((1  + l/M)/2  + 
(1  + 1 /M)N)  + (3 M2  +6 M + 2)((1  + 1 /M)n  - l)/IV  - (3 M + 2)(1  + 1/M)N,  which  is 
(e  — 2.5 )M  + 0(1)  when  IV  = M — 1. 

For  further  properties  of  the  numbers  [[”]],  see  John  Riordan,  Combinatorial 
Identities  (New  York:  Wiley,  1968),  228-229. 

47.  The  analysis  of  Algorithm  L applies,  almost  word  for  word!  Any  probe  sequence 
with  cyclic  symmetry,  and  which  explores  only  positions  adjacent  to  those  previously 
examined,  will  have  the  same  behavior. 

48.  C'N  = 1 + p + p2  + • • • , where  p = N/M  is  the  probability  that  a random  location 
is  filled;  hence  C'N  = M/(M  - N),  and  CN  = IV1  c'k  = N~1M{Hm  - HM-n). 
These  values  are  approximately  equal  to  those  for  uniform  probing,  but  slightly  higher 
because  of  the  chance  of  repeated  probes  in  the  same  place.  Indeed,  for  4 = N < M < 
16,  linear  probing  is  better! 

In  practice  we  wouldn’t  use  infinitely  many  hash  functions;  some  other  scheme 
like  linear  probing  would  ultimately  be  used  as  a last  resort.  This  method  is  inferior 
to  those  described  in  the  text,  but  it  is  of  historical  importance  because  it  suggested 
Morris’s  method,  which  led  to  Algorithm  D.  See  CACM  6 (1963),  101,  where  M.  D. 
Mcllroy  credits  the  idea  to  V.  A.  Vyssotsky;  the  same  technique  had  been  discovered 
as  early  as  1956  by  A.  W.  Holt,  who  used  it  successfully  in  the  GPX  system  for  the 
UNIVAC. 


49.  C'N  - 1 — J2k>b(k  ~ b)PNk  « Y,k>b(k  ~ b)e  ab(a b)k/k\  = abtb(a).  [Note:  We 
have 


6>0  \fc>6  / 


P'(l)  z(P{z)  - 1) 
1-2  (1-2)2 
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in  general,  if  P(z)  = P0  + P\z  + ■ ■ • is  any  probability  generating  function.]  And 

k>b 

= ^ - !)  - 2k(b  - 1)  + b(b  - 1 ))PNk 

^ k>b 

= ie-i,Q(fea)f,6!-1(6  + ba  - 2b  + 2 + (6a2  - 2a(6  - 1)  + 6 - 1 )R{a,  6)). 

[The  analysis  of  successful  search  with  chaining  was  first  carried  out  by  W.  P.  Heis- 
ing  in  1957.  The  simple  expressions  in  (57)  and  (58)  were  found  by  J.  A.  van  der  Pool  in 
1971;  he  also  considered  how  to  minimize  a function  that  represents  the  combined  cost  of 
storage  space  and  number  of  accesses.  We  can  determine  the  variance  of  C'N  and  of  the 
number  of  overflows  per  bucket,  since  J2k>b(k  ~ b)2PNk  = (2N/M)(Cn  — 1)  — (C'N  — 1). 
The  variance  of  the  total  number  of  overflows  may  be  approximated  by  M times  the 
variance  in  a single  bucket,  but  this  is  actually  too  high  because  the  total  number  of 
records  is  constrained  to  be  TV.  The  true  variance  can  be  found  as  in  exercise  37.  See 
also  the  derivation  of  the  chi-square  test  in  Section  3.3. 1C.] 

50.  And  next  that  Q0(M,N  — 1)  = ( M/N)(Q0(M,N ) - l).  In  general,  rQr(M,N)  = 
MQr_2(M, TV)  - (M  - TV  - r)Qr-i(M, TV)  = M(Qr_x(M,TV  + 1)  - Qr_i(M,TV)); 
Qr(M,N-  1)  = (M/N)(Qr(M,N)  - Qr-i{M, TV)). 

51.  R(a,n)  — a~1(nlean(an)~n  — Qo(otn,n)). 

52.  See  Eq.  1.2.11.3-(g)  and  exercise  3.1-14. 

53.  By  Eq.  1.2.11.3-(8),  a(an)nR(a,n)  = ean/y(n+l,  an);  hence  by  the  suggested 
exercise  R(a,n)  = (1  — a)-1  — (1  — a)~3n_1  + 0(n~2).  [This  asymptotic  formula  can 
be  obtained  more  directly  by  the  method  of  (43),  if  we  note  that  the  coefficient  of  ak 
in  R(a,  n ) is 

1-(fc22)n-1+0(fc4n-2). 

In  fact,  the  coefficient  of  ak  is 


by  Eq.  1.2.9-(28).] 

54.  Using  the  hint  together  with  Eqs.  1.2.6-(53)  and  1.2.6-(4g),  we  have 


X>(«)  = E 

6>1  m > 1 


(m  + l)(m)m! 


Eft)*-1)-*"*1  = 

k 


E qT72- 

m > 1 


The  hint  follows  from  Rummer’s  well-known  hypergeometric  identity  e zF(a;  b;  z)  = 
F(b  — a;  b;  — z ),  since  (n  + 1)!  t„(a)  = e~na(an)nF(2;  n + 2;  an);  see  Crelle  15  (1836), 
39-83,  127-172,  Eq.  26.4. 

55.  If  B(z)C(z)  = Siz',  we  have  co  = s0  -I b Sb,  ci  = s&+1,  c2  = S(,+2,  . . . ; hence 

B(z)C(z)  = zbC(z)  + Q(z).  Now  P(z)  = zb  has  6—1  roots  qj  with  \qj\  < 1,  determined 
as  the  solutions  to  = a ]~\j,  a ; = e2^^6.  To  solve  let  t = aq 
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and  z = awe  “ so  that  t = ze* . By  Lagrange’s  formula  we  get 


1 

1 -q 


1 + ErE 


(n  — r)\ 


‘+E'SsHrE(;,)(-rv 


r>l  m>0 


n>r 


By  Abel’s  limit  theorem,  letting  |w|  — ► 1 from  inside  the  unit  circle,  this  can  be 
rearranged  to  equal 


^ + E £<-'>”  E (m„"  2)(-d- v«<» + 1)-'. 


Now  replacing  w by  cjj  and  summing  for  1 < j < b yields 


6-1 

2 


1 


+ Y'  a 


1 

2 


(-l)m6y^  fm-2 

n>l 


)(-l  )nb-\nb)m 


and  the  desired  result  follows  after  some  more  juggling  using  the  hint  of  exercise  54. 

This  analysis,  applied  to  a variety  of  problems,  was  begun  by  N.  T.  J.  Bailey, 
J.  Roy.  Stat.  Soc.  B16  (1954),  80-87;  M.  Tainiter,  JACM  10  (1963),  307-315;  A.  G. 
Konheim  and  B.  Meister,  JACM  19  (1972),  92-108. 

56.  See  Blake  and  Konheim,  JACM  24  (1977),  591-606.  Alfredo  Viola  and  Patricio 
Poblete  [Algorithmic a 21  (1998),  37-71]  have  shown  that 


°*»  ■ -■ w nE(6”y  V E(i-_JI)<-«"“-,‘'" 

j> 2 fc>l 


3>  2 
6-1 


/ irM  1 1 

- V 86  + 36  + 6 E 


+ 


36  6 (l  — T(e2’r*J/i-i))  24  V 263M 


+ 0(b~2  M~x), 


where  T is  the  tree  function  of  Eq.  2.3.4.4-(3o). 

58.  0 1 2 3 4 and  0 2 4 1 3,  plus  additive  shifts  of  1 1 1 1 1 mod  5,  each  with 
probability  Similarly,  for  M = 6 we  need  30  permutations,  and  a solution  exists 
starting  with 

^x  012345,  ix  013254,  ^x  024315,  ix  023451,  ix  034125. 
For  M = 7 we  need  49,  and  a solution  is  generated  by 

2^x  0123456,  yjjj  x 0153246,  2^x  0243516,  ^§5X  0263145, 

2^x  0361425,  j^x  0326415,  T2?x  0315426. 

59.  No  permutation  can  have  a probability  larger  than  1/ ( [a^/2j  ) ’ so  there  must  be  at 
least  ( la^/2J ) = exp(M  In  2 + O(logM))  permutations  with  nonzero  probability. 

60.  Preliminary  results  have  been  obtained  by  Ajtai,  Komlos,  and  Szemeredi,  Infor- 
mation Processing  Letters  7 (1978),  270-273. 

62.  See  the  discussion  in  AMM  81  (1974),  323-343,  where  the  best  cyclic  hashing 
sequences  are  exhibited  for  M < 9. 

63.  MHm,  by  exercise  3. 3. 2-8;  the  standard  deviation  is  « ttM/\/&. 
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64.  The  average  number  of  moves  is  equal  to  |(iV  — 1 )/M  + §(JV  — 1 )(N  — 2 )/M2  + 

f {N  — 1)(N  — 2 )(N  — 3 )/M3  4 ~ — + In  [An  equivalent  problem  is  solved 

in  Comp.  J.  17  (1974),  139-140.] 

65.  The  keys  can  be  stored  in  a separate  table,  allocated  sequentially  (assuming  that 
deletions,  if  any,  are  LIFO).  The  hash  table  entries  point  to  this  “names  table”;  for 
example,  TABLE  [i]  might  have  the  form 


L‘ 

KEY  [i] 

where  L,  is  the  number  of  words  in  the  key  stored  at  locations  KEY  [i] , KEY  [i]  + 1, . . . . 

The  rest  of  the  hash  table  entry  might  be  used  in  any  of  several  ways:  (i)  as  a 
link  for  Algorithm  C;  (ii)  as  part  of  the  information  associated  with  the  key;  or  (iii)  as 
a “secondary  hash  code.”  The  latter  idea,  suggested  by  Robert  Morris,  sometimes 
speeds  up  a search  [we  take  a careful  look  at  the  key  in  KEY[i]  only  if  h2(K)  matches 
its  secondary  hash  code,  for  some  function  h2(K)]. 

66.  Yes;  and  the  arrangement  of  the  records  is  unique.  The  average  number  of  probes 
per  unsuccessful  search  is  reduced  to  Cjv-i,  although  it  remains  C'N  when  the  JVth 
term  is  inserted.  This  important  technique  is  called  ordered  hashing.  See  Comp.  J.  17 
(1974),  135-142;  D.  E.  Knuth,  Literate  Programming  (1992),  144-149,  216-217. 

67.  (a)  If  Cj  = 0 in  (44),  an  optimum  arrangement  is  obtained  by  sorting  the  a’s  into 

nonincreasing  “cyclic  order,”  assuming  that  j — 1 > > 0 > M — 1 > •••  > j. 

(b)  Between  steps  L2  and  L3,  exchange  the  record-in-hand  with  TABLE[i]  if  the  latter  is 
closer  to  home  than  the  former.  [This  algorithm,  called  “Robin  Hood  hashing”  by  Celis, 
Larson,  and  Munro  in  FOCS  26  (1985),  281-288,  is  equivalent  to  a variant  of  ordered 
hashing.]  (c)  Let  h(m,  n,  d)  be  the  number  of  hash  sequences  that  make  Co  < d.  It  can 
be  shown  [Comp.  J.  17  (1974),  141]  that  ( h(m,n,d ) — h(m,n,d  — 1))M  is  the  total 
number  of  occurrences  of  displacement  d > 0 among  all  MN  hash  sequences,  and  that 
we  can  write  h(M,  N,  d)  = a(M,  N,  d + 1)  - Na(M,  N - 1,  d + 1)  where  a(m,  n,  d)  = 
53fc=o  (fc)  {rn+d—k)n~k(k—d)k.  An  elaborate  calculation  using  the  methods  of  exercises 
28  and  50  now  shows  that  the  average  value  of  53  V is 


M1_JV  d2(h(M,  N,  d)  - h(M,  N,d—  1)) 


M’ 


2 M N N2 


N 


2 + 3 + 6 + 6 M 6 M M' 


(M  _ 
\ 2 


M 


1 7 2 a cr 

2(1 -a)2  6(1  — a)  + 3 + 6 + 6 


N 2\ 

y + 3jQo(M,lV) 

+ 0(1) 


when  N = aM.  Without  the  modification  (see  exercise  28),  E 53  d2  comes  to 
f(Q2(M,N)  - Qi(M,N))  - y(Q0(M,AO  - !)  + f 

If  the  records  all  have  approximately  the  same  displacement  d,  and  if  successful 
searches  are  significantly  more  common  than  unsuccessful  ones,  it  is  advantageous  to 
start  at  position  h!  — h(K)+d  and  then  to  probe  h'  — 1,  h'  + 1,  h'  — 2,  etc.  P.  V.  Poblete, 
A.  Viola,  and  J.  I.  Munro  have  shown  [Random  Structures  & Algorithms  10  (1997), 
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221-255]  that  Y)  d2  can  be  made  almost  as  small  as  in  the  Robin  Hood  method  by  using 
a much  simpler  approach  called  “last-come-first-served”  hashing,  in  which  every  newly 
inserted  key  is  placed  in  its  home  position;  all  other  keys  move  one  step  away  until 
an  empty  slot  is  found.  The  Robin  Hood  and  last-come-first-served  techniques  apply 
to  double  hashing  as  well  as  to  linear  probing,  but  the  reduction  in  probes  does  not 
compensate  for  the  increased  time  per  probe  with  respect  to  double  hashing  unless  the 
table  is  extremely  full.  (See  Poblete  and  Munro,  J.  Algorithms  10  (1989),  228-248.) 

68.  The  average  value  of  (di  + • • • + (In)2  can  be  shown  to  equal 

~((M  - N)3  + {N  + 3 )(M  - IV)2  + (81V  + 1 )(M  - N)  + 51V2  + 41V  - 1 

- ((M  - N)3  + 4 (M  - N)2  + (6 N + 3) (AT  - N)  + 8 N)Q0(M,  N - 1)) 
using  the  connection  between  the  parking  problem  and  connected  graphs  mentioned  in 
the  answer  to  exercise  31.  To  get  the  variance  of  the  average  number  of  probes  in  a 
successful  search,  divide  by  N 2 and  subtract  \(Q0(M,  1V-1)-1)2;  this  is  asymptotically 
3^((1  + 2a)/(l  — a)4  — 1 )/N  + 0(N~2).  (See  P.  Flajolet,  P.  V.  Poblete,  and  A.  Viola, 
Algorithmica  22  (1998),  490-515;  D.  E.  Knuth,  Algorithmic a 22  (1998),  561-568. 
The  variance  calculated  here  should  be  distinguished  from  the  total  variance,  which  is 
^Yl,d2/N  — \(Qo(M,  N — 1)  — l)2;  see  the  answers  to  exercises  37  and  67.) 

69.  Let  qk  = pk  +Pk+i  H ; then  the  inequality  qk  > max(0, 1 — (k  — 1 )(M  — n)/M ) 

gives  a lower  bound  on  C'N  = Yk>i1k- 

70.  A remarkably  simple  proof  by  Lueker  and  Molodowitch  [ Combinatorics  13  (1993), 
83-96]  establishes  a similar  result  but  with  an  extra  factor  (log  M)2  in  the  O-bound;  the 
stated  result  follows  in  the  same  way  by  using  sharper  probability  estimates.  A.  Siegel 
and  J.  P.  Schmidt  have  shown,  in  fact,  that  the  expected  number  of  probes  in  double 
hashing  is  1/(1  — a)  + 0(1/M ) for  fixed  a = N/M.  [Computer  Science  Tech.  Report 
687  (New  York:  Courant  Institute,  1995).] 

72.  [J.  Comp.  Syst.  Sci.  18  (1979),  143-154.]  (a)  Given  keys  Ku  . . . , KN  and  K,  the 
probability  that  Kj  is  in  the  same  list  as  K is  < 1/M  if  K ± Kj.  Hence  the  expected 
list  size  is  < 1 4-  (N  — 1 )/M. 

(b)  Suppose  there  are  Q possible  characters;  then  there  are  MQ  possible  choices 
for  each  hj . Choosing  each  hj  at  random  is  equivalent  to  choosing  a random  row  from 

a matrix  H of  MQl  rows  and  Ql  columns,  with  the  entry  h(x i . . . xi)  = (hi(xi)  -I h 

hi(xi))  mod  M in  column  x\ . . . xj.  In  columns  K = X\  ...xi  and  K'  = x[...  x\  with 
xj  / x'j  for  some  j,  we  have  h(K)  = (s  + hj(xj))  mod  M and  h(K')  = (s'  + hj(x'j)) 
mod  M,  where  s = hi(xi)  and  s'  = YY^j  h,(x')  are  independent  of  hj.  The  value 
of  hj(xj)  — hj(x'j)  is  uniformly  distributed  modulo  M;  hence  we  have  h(K)  = h(K') 
with  probability  1/M,  regardless  of  the  values  of  s and  s'. 

(c)  Yes;  adding  any  constant  to  hj(xj)  changes  h{x)  by  a constant,  modulo  M. 

73.  (i)  This  is  the  special  case  of  exercise  72(c)  when  each  key  is  regarded  as  a sequence 
of  bits,  not  characters.  [It  was  invented  as  early  as  1970  by  Alfred  L.  Zobrist,  whose 
original  technical  report  has  been  reprinted  in  ICC  A Journal  13  (1990),  69-73.]  (ii)  The 
proof  of  (b)  shows  that  it  suffices  to  show  that  hj(xj)  - hj(x'j)  is  uniform  modulo  M 
when  Xj  # x'  . And  in  fact,  the  probability  that  hj(xj)  = y and  hj(x'j)  = y'  is  1/M2 , 
for  any  given  y and  y' , because  the  congruences  djXj  +bj  =y  and  (lj x'3  + bj  = y'  have 
a unique  solution  ( a.j,bj ) for  any  given  (y,  y'),  modulo  the  prime  M. 

When  M is  not  prime  and  p is  a prime  > M,  a similar  result  holds  if  we  let 
hj(xj)  = (((ijX-j  + bj)  modp)  mod  M,  where  a,j  and  bj  are  chosen  randomly  mod  p. 
In  this  case  the  family  is  not  quite  universal,  but  it  comes  close  enough  for  practical 
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purposes:  The  probability  that  different  keys  collide  is  at  most  1 /M  + r(M  — r)/Mp 2 < 
1/M  + M/4p2,  where  r = p mod  M. 


74.  The  statement  is  false  in  general.  For  example,  suppose  M = N = n2,  and  consider 
the  matrix  H with  (n)  rows,  one  for  every  way  to  put  n zeros  in  different  columns; 
the  nonzero  entries  are  1,  2,  . . . , N — n from  left  to  right  in  each  row.  This  matrix  is 
universal  because  there  are  (^~2)  = (^77  7737  < (^)  (77)  2 = R/M  matches  in  every 
pair  of  columns.  But  the  number  of  zeros  in  every  row  is  VN  / 0(1)  + 0(N/M). 

Notes:  This  exercise  points  out  that  expected  list  size  is  quite  different  from  the 
expected  number  of  collisions  when  a new  key  is  inserted.  Consider  letting  h{x  1 . . . Xi)  = 
h\(x\),  where  hi  is  chosen  at  random.  This  family  of  hash  functions  makes  the  expected 
size  of  every  list  N/M\  yet  it  is  certainly  not  universal,  because  a set  of  N keys  that 
have  the  same  first  character  x\  will  lead  to  one  list  of  size  N and  all  other  lists  empty. 
The  expected  number  of  collisions  will  be  N(N  — 1)/2,  but  with  a universal  hash  family 
this  number  is  at  most  N(N  — 1)/2M,  regardless  of  the  set  of  keys. 

On  the  other  hand  we  can  show  that  the  expected  size  of  every  list  is  0(1)  + 
0(N/i/M)  in  a universal  family.  Suppose  there  are  Zh  zeros  in  row  h.  Then  that  row 
contains  at  least  (**)  pairs  of  equal  elements.  The  maximum  of  Ylh=iXh  subject  to 
^2h=i  (*2)  — (^)TfyM  occurs  when  each  zh  is  equal  to  2 where  (2)  = ( 2 ) /M,  namely 


*=2  + 


1 N(N  — 
4 + M 


1) 


< 1 + 


N(N  — 1) 


M 


75.  (a)  Obviously  true,  even  if  h2,  ...,  hi  are  identically  zero,  (b)  True,  by  the 
answer  to  72(b).  (c)  True.  The  result  is  clear  if  K,  K',  and  K"  all  differ  in  some 
character  position.  Otherwise,  say  Xj  = x'j  / x'j  and  xk  ^ x'k  = x'k.  Then  the 
quantities  hj(xj)  + hk(xk),  hj(xj)  + hk(x'k),  and  hj(xj)  + hk(x'k)  are  independent  of 
each  other,  uniformly  distributed,  and  independent  of  the  other  l — 2 characters  of  the 
keys,  (d)  False.  Consider,  for  example,  the  case  M = l = 2 with  1-bit  characters.  Then 
all  four  keys  hash  to  the  same  location  with  probability  1/4. 

76.  Use  h(K)  = (ho(l)  + hi(xi)-\ \-hi(xi))  mod  M,  where  each  hj  is  chosen  as  in  ex- 

ercise 73.  Generate  the  random  coefficients  for  hj  (and,  if  desired,  precompute  its  array 
of  values)  when  a key  of  length  > j occurs  for  the  first  time.  Since  l is  unbounded,  the 
matrix  H is  infinite;  but  only  a finite  portion  is  relevant  in  any  particular  program  run. 

77.  Let  p < 2~16  be  the  probability  that  two  32-bit  keys  have  the  same  image  under  H . 
The  worst  case  occurs  when  two  given  keys  agree  in  seven  of  their  eight  32-bit  subkeys; 
then  the  probability  of  collision  is  1 - (1  -p)4  < 4p.  [See  Wegman  and  Carter,  J.  Comp. 
Syst.  Sci.  22  (1981),  265-279.] 


78.  Let  g(x)  = [x/2k\  mod  2n~fc  and  S(x,x')  = X!L=o 1 ld(x  + ■&)  = d(x'  + &)]■  Then 
5(x  + \,x'  + 1)  = 5(x,x')  + [ g(x  + 2 k)  = g{x'  + 2*)]  - [p(z)  =g(x')\  — 5(x,x').  Also 
<5(a:,0)  = ( 2k  — ( x mod  2n))  + ( 2k  — ((— x)  mod  2n))  when  0 < x < 2n,  where  a — b = 
max(a  — 5,0).  These  formulas  characterize  S(x,  x)  when  x ^ x'  (modulo  2n). 

Now  let  A — {a  | 0 < a < 2n , a odd}  and  B = {6  | 0 < b < 2k}.  We  want  to 
show  that  ^Za€A  J2beBl9(ax  + 5)  = g{ax'  + b )]  < R/M  = 2n~1+k/2n~k  = 22k~1  when 
0 < x < x'  < 2".  And  indeed,  if  x'  — x — 2pq  with  q odd,  then  we  have 


( 9(ax  + b)  = g{ax'  + b)}  = E S(ax,  ax')  = 2 ^ ( 2k  — ((2 paq)  mod  2n)) 

a£A  b£B  a&A  a£A 

2n— p— l_1  2fc_p_l_1 

= 2P+1  (2fc  — 2p(2j  + 1))  = 2P+1  ^ (2k  — 2p(2j  + l))[p<  fc]  = 22fc_1[p  < k], 

3= 0 j=o 
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[See  Lecture  Notes  in  Computer  Science  1672  (1999),  262-272.] 


SECTION  6.5 

1.  The  path  described  in  the  hint  can  be  converted  by  changing  each  downward  step 
that  runs  from  (i  — 1,  j)  to  a “new  record  low”  value  (i,j  — 1)  into  an  upward  step.  If  c 
such  changes  are  made,  the  path  ends  at  (m,  n — 2t  + 2c),  where  c > 0 and  c > 2t  — n; 
hence  n — 2t  + 2c  > n — 2k.  In  the  permutation  corresponding  to  the  changed  path, 
the  smallest  c elements  of  list  B correspond  to  the  downward  steps  that  changed,  and 
list  A contains  the  t — c elements  corresponding  to  downward  steps  that  didn’t  change. 

When  t = k it  is  not  difficult  to  see  that  the  construction  is  reversible;  hence 
exactly  (£)  permutations  are  constructed.  Incidentally,  according  to  this  proof,  the 
contents  of  lists  A and  C may  appear  in  arbitrary  order. 

Notes:  We  have  counted  these  paths  in  another  way  in  exercise  2. 2. 1-4.  When 
k = Ln/2J  this  construction  proves  Sperner’s  Theorem,  which  states  that  it  is  impossible 
to  have  more  than  (^n"2j)  subsets  of  {1, 2, ... , n}  with  no  subset  contained  in  another. 
[Emanuel  Sperner,  Math.  Zeitschrift  27  (1928),  544-548.]  For  if  we  have  such  a 
collection  of  subsets,  each  of  the  (£)  permutations  can  have  at  most  one  of  the  subsets 
appearing  in  the  initial  positions,  yet  each  subset  appears  in  some  permutation.  The 
construction  used  here  is  a disguised  form  of  a more  general  construction  by  which 
N.  G.  de  Bruijn,  C.  van  Ebbenhorst  Tengbergen,  and  D.  Kruyswijk  [IVieuw  Archief 
voor  Wiskunde  (2)  23  (1951),  191-193]  proved  the  multiset  generalization  of  Sperner’s 
Theorem:  “Let  M be  a multiset  containing  n elements  (counting  multiplicities).  The 
collection  of  all  [n/2j -element  submultisets  of  M is  the  largest  possible  collection  such 
that  no  submultiset  is  contained  in  another.”  For  example,  the  largest  such  collection 
when  M = {a,  a,  6,  b,  c,  c}  consists  of  the  seven  submultisets  {a,  a,  b},  {a,  a,  c},  {a,  b,  b}, 
{a,b,c},  {a,  c,  c},  {b,b,c},  {b,  c,  c}.  This  would  correspond  to  seven  permutations  of 
six  attributes  Ai,  B\,  Ai,  B2,  A3,  B3  in  which  all  queries  involving  A,  also  involve  Bt. 
Further  comments  appear  in  a paper  by  C.  Greene  and  D.  J.  Kleitman,  J.  Combinatorial 
Theory  A20  (1976),  80-88. 

2.  Let  dijk  be  a list  of  all  references  to  records  having  (i,j,  k)  as  the  respective  values 
of  the  three  attributes,  and  assume  that  aon  is  the  shortest  of  the  three  lists  aon,  aioi, 
ano.  Then  a minimum-length  list  is  aooiaouaiiiflioiaiooaiioaiiiaoiiaoio.  However,  if 
aon  is  empty  and  so  is  either  of  aooi,  aoio,  or  aioo,  the  length  can  be  shortened  by 
deleting  one  of  the  two  occurrences  of  am  [CACM  15  (1972),  802-808]. 

3.  (a)  Anise  seed  and/or  honey,  possibly  in  combination  with  nutmeg  and/or  vanilla 
extract,  (b)  None. 

4.  Let  pt  be  the  probability  that  the  query  involves  exactly  t bit  positions,  and  let  Pt 
be  the  probability  that  t given  positions  are  all  1 in  a random  record.  Then  the  answer 
is  ^2tPtPt,  minus  the  probability  that  a particular  record  is  a “true  drop”;  the  latter 
probability  is  {^1%)  / (^),  where  N = (£).  By  the  principle  of  inclusion  and  exclusion, 

Pt  = (t.)f{n-j,k,r)/f{n,k,r), 

i> 0 VJ/ 


where  f(n,k,r)  is  the  number  of  possible  choices  of  r distinct  fc-bit  attribute  codes  in 
an  n-bit  field,  namely  (‘^)  where  N = (").  And  if  q = r we  have,  by  exercise  1.3.3-26, 


Pt  = (-!)'(*  ^)  (t  ” Z)P‘+'  = (™) 
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Notes:  The  calculations  above  were  first  carried  out,  in  more  general  form,  by 
G.  Orosz  and  L.  Takacs,  J.  of  Documentation  12  (1956),  231-234.  The  mean  ^ tpt  is 
easily  shown  to  be  n(l  — f(n—  1,  k,  q)/f(n,  k,q)).  Another  assumption,  that  the  random 
attribute  codes  in  records  and  queries  are  not  necessarily  distinct,  as  in  the  techniques 
of  Harrison  and  Bloom,  can  be  analyzed  by  the  same  method,  setting  /(n,  k,  r)  — (£)r. 
When  the  parameters  are  in  appropriate  ranges,  we  have  Pt  « (1  - e kr/ny  and 

Y,Ptpt  « Pn  (l-<;xp  (-kq/n))- 

6-  L(t)  = E3  (7)(^)^i(i)^(f-  j)/(mi|m2)-  [Hence  if  Lx{t)  « Nia~*  and 
L2(t)  w N2a~\  then  L(t)  w NiN2a~*.] 

7.  (a)  L(l)  = 3,  L(2)  = l|.  (b)  L(l)  = 3|,  L( 2)  = 2|,  L( 3)  = 1^.  [Note:  A trivial 
projection  mapping  such  as  0 0 * * ->  0,  01**  -4  1,  1 0 * * -►  2,  1 1 * * -►  3,  has  a 
worse  worst-case  behavior;  but  it  has  a better  average  case,  because  of  the  exercise  that 
follows:  L(l)  = 3,  i(2)  = 2|,  L(3)  = l|.] 

8.  (a)  When  S = So0U  Sil,  we  have  ft(S)  = ft(S0  U Si)  + ft-i(S0)  + ft-i(Si). 

Therefore  is  the  minimum  of  ft(s0,  m-l)+ft-i(s0,  m-l)+/t_i(si,  m-1)  over 

all  so  and  si  such  that  2m  1 > so  > sj  > 0 and  so  + si  = s.  To  prove  that  the  minimum 
occurs  for  so  = [s/2]  and  si  = [s/2j,  we  can  use  induction  on  m,  the  result  being  clear 
for  m = 1:  Given  m>  2,  let  gt(s)  = ft(s,m  - 1)  and  ht(s)  = ft{s,m  - 2).  Then,  by 
induction,  gt(s0)  + ffi-i(so)  + gt-i(si)  = ht([s0/2])  + ht-1([s0/2])  + ht-i(Lso/2J)  + 
/it-i([so/2])+/it_2([so/2])-l-/it-2(Lso/2j)+h(_i([si/2])+ht_2([si/2])-|-/it_2(Lsi/2j), 
which  is  > St([s0/2]  + [si/2]) +5t_i([s0/2]  + [sr/2])  +gt-i(Ls0/2j  + Lsx/2J).  And  if 
so  > Si  + 1,  we  have  [so/2]  + [si/2]  < so,  except  in  the  case  so  = 2k  + l and  si  = 2k  — 1. 
In  the  latter  case,  however,  gt(s0)  + fl-t-i(so)  + fft-i(si)  > ht{2k  + 1)  + 2ht-i(2k)  > 
ht(2k)  + 2ht-i{2k). 

(b)  Observe  that  the  set  S containing  the  numbers  0,  1,  ...,  s — 1 in  binary 
notation  has  the  property  that  So  U Si  = So,  and  So  contains  [s0/2]  elements.  It 
follows,  incidentally,  that  /t(2m~n ,m)  = [z*]  (1  + z)n(l  + 2 z)m~n . 

10.  (a)  There  must  be  $v(v  — 1)  triples,  and  xv  must  occur  in  ~v  of  them,  (b)  Since  v 
is  odd,  there  is  a unique  triple  {a u,yj,z}  for  each  i,  and  so  S'  is  readily  shown  to  be  a 
Steiner  triple  system.  The  pairs  missing  in  K'  are  {z,x2},  {x2,y2},  {y2,x3},  {x3,y3}, 
...,  {a:t>-i, yv-i},  {yv-i,xv},  {xv,z}.  (d)  Starting  with  the  case  v — 1 and  applying 
the  operations  v — >•  2w  — 2,  v — > 2v  + 1 yields  all  nonnegative  numbers  not  of  the 
form  3A;  + 2,  because  the  cases  6/e  + (0, 1, 3, 4)  come  respectively  from  the  smaller  cases 
3/e + (1,0, 1,3). 

Incidentally,  “Steiner  triple  systems”  should  not  have  been  named  after  Steiner, 
although  that  name  has  become  deeply  entrenched  in  the  literature.  Steiner’s  publica- 
tion [Crelle  45  (1853),  181-182]  came  several  years  after  Kirkman’s,  and  Felix  Klein  has 
noted  [ Vorlesungcn  fiber  die  Entwicklung  der  Math,  im  19.  Jahrhundert  1 (Springer, 
1926),  128]  that  Steiner  quoted  English  authors  without  giving  them  credit,  during  the 
later  years  of  his  life.  Moreover,  the  concept  had  appeared  already  in  two  well-known 
books  of  J.  Pliicker  [System  der  analytischen  Geometrie  (1835),  283-284;  Theorie  der 
algebraischen  Curven  (1839),  245-247].  Kirkman  wrote  his  paper  in  response  to  a 
substantially  more  general  problem  posed  by  W.  S.  B.  Woolhouse,  namely  to  find 
the  maximum  number  of  t-element  subsets  of  {1, . . . , n}  in  which  no  (/-element  subset 
appears  more  than  once;  that  problem  remains  unsolved.  [See  Lady’s  and  Gentleman’s 
Diary  (1844),  84;  (1845),  63-64;  (1846),  76,  78;  (1847),  62-67.] 

11.  Take  a Steiner  triple  system  on  2v  + 1 objects.  Call  one  of  the  objects  z and  name 
the  others  in  such  a way  that  the  triples  containing  z are  {z,  xt,  xt};  delete  those  triples. 
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12.  {k,  (fc+1)  mod  14,  (k+ 4)  mod  14,  (k+ 6)  mod  14},  for  0 < k < 14,  where  ( k + 7) 
mod  14  is  the  complement  of  k.  [Complemented  systems  are  a special  case  of  group 
divisible  block  designs-,  see  Bose,  Shrikhande,  and  Bhattacharya,  Ann.  Math.  Statistics 
24  (1953),  167-195.] 

14.  Deletion  is  easiest  in  k-d  trees  (a  replacement  for  the  root  can  be  found  in  about 
0(N1~1'k)  steps).  In  quadtrees,  deletion  seems  to  require  rebuilding  the  entire  subtree 
rooted  at  the  node  being  removed  (but  this  subtree  contains  only  about  log  N nodes 
on  the  average).  In  post-office  trees,  deletion  is  almost  hopeless. 

16.  Let  each  triple  correspond  to  a codeword,  where  each  codeword  has  exactly  three 
1-bits,  identifying  the  elements  of  the  corresponding  triple.  If  u,  v,  w are  distinct 
codewords,  u has  at  most  two  1 bits  in  common  with  the  superposition  of  v and  w, 
since  it  had  at  most  one  in  common  with  v or  w alone.  [Similarly,  from  quadruple 
systems  of  order  v we  can  construct  v(v  — 1)/12  codewords,  none  of  which  is  contained 
in  the  superposition  of  any  three  others,  etc.] 

17.  (a)  Let  co  = &o  and,  for  1 < A;  < n,  let  c&  = (if  bk- i = 0 then  * else  bk),  C-k  = (if 
bk-i  = 1 then  * else  bk).  Then  the  basic  query  c_n  . . . Co  . . . c„  describes  the  contents  of 
bucket  bo  ..  ,b„.  [Consequently  this  scheme  is  a special  case  of  combinatorial  hashing, 
and  its  average  query  time  matches  the  lower  bound  in  exercise  8(b).] 

(b)  Let  dk  = [bit  k is  specified],  for  — n < k < n.  We  can  assume  that  d-k  < dk 
for  1 < k < n.  Then  the  maximum  number  of  buckets  examined  occurs  when  the 
specified  bits  are  all  0,  and  it  may  be  computed  as  follows:  Set  x <—  y 4-  1.  Then  for 
k = n,  n — 1,  . . . , 0,  set  ( x , y)  (*,  y)Md_k+dk , where 


Finally,  output  x (which  also  happens  to  equal  y,  after  k — 0). 

Say  that  ( x , y)  > {%' , V ')  if  x > x'  and  x + y > x'  + y1  ■ Then  if  ( x , y)  y (x' , y ')  we 
have  ( x,y)Md  X ( x',y')Md  for  d = 0,  1,  2.  Now 

(x,y)M2M(M0  = (Fj+3x,Fj+3x), 

(x,  y)MxM3xMx  = ( Fj+3x  + Fj+2y,  Fj+2x  + Fj+1y), 

(x,y)M0M(M2  = ( Fj+2x  + Fi+2y,Fi+2x  + Fj+2y); 

therefore  we  have  (x,  y)M1Mx  Mx  > (x,y)M2M{M0,  because  2 y > x;  and  similarly 
(x,y)M1MxM1  y (x,y)M0M[M2,  because  x > y.  It  follows  that  the  worst  case  occurs 
when  either  d-k  + dk  < 1 for  1 < k < n or  d-k  + dk  > 1 for  1 < k < n.  We  also  have 

(■ x,y)M0M 3 = ( Fj+2x  + Fj+2y,Fj+ix  + Fj+1y), 

(x,y)M{M0  = (Fj+ 2x  + Fj+1y,Fj+2x  + Fj+iy); 

{x,  y)M2M{  = (Fj+2x,  Fj+ix), 

(x,y)M{M2  = (Fj+ix  + Fjy,Fj+1x  + Fjy). 

Consequently  the  worst  case  requires  the  following  number  of  buckets: 

2n~tFt+3,  if  0 < t < n [from  M}M^+1-<]; 

2t-nF3„_2t+3,  if  n < t < [3n/2]  [from  M?n~2t {M1M2)t~n M0]; 

22n+i  if  [3n/2]  <t<2n  [from  M22t_3n(M1M2)2n-tM0]. 
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[These  results  are  essentially  due  to  W.  A.  Burkhard,  BIT  16  (1976),  13-31,  generalized 
in  J.  Comp.  Syst.  Sci.  15  (1977),  280-299;  but  Burkhard’s  more  complicated  mapping 
from  ao  . . . a 2„  to  b0  . . . bn  has  been  simplified  here  as  suggested  by  P.  Dubost  and 
J.-M.  TYousse,  Report  STAN-CS-75-511  (Stanford  Univ.,  1975).] 

18.  (a)  There  are  2 "(m  — n)  *’s  altogether,  hence  2”n  digits,  with  2 nn/m  digits  in  each 
column.  Half  of  the  digits  in  each  column  must  be  0.  Hence  2 n_1n/m  is  an  integer, 
and  each  column  contains  (2 n~1n/m)2  mismatches.  Since  each  pair  of  rows  has  at  least 
one  mismatch,  we  must  have  2n(2n  — l)/2  < (2 n~1n/m)2m. 

(b)  Consider  the  2n  m-bit  numbers  that  are  0 in  m — n specified  columns.  Half  of 
these  have  odd  parity.  A row  with  * in  any  of  the  unspecified  columns  covers  as  many 
evens  as  odds. 

(c)  *000,  *111,  0*10,  1*10,  00*1,  10*1,  010*,  110*.  This  one  isn’t  as 
uniform  as  (13),  because  a query  like  *01*  hits  four  rows  while  *10*  hits  only  two. 
Notice  that  (13)  has  cyclic  symmetry. 

(d)  Generate  43  rows  from  each  row  of  (13)  by  replacing  each  * by  * * * *,  each  0 
by  any  one  of  the  first  four  rows,  and  each  1 by  any  one  of  the  last  four  rows.  (A  similar 
construction  makes  an  ABD(mm',nn')  from  any  ABD(m, n)  and  ABD(m',n').) 

(e)  Given  an  ABD(16, 9),  we  can  encircle  one  * in  each  row  in  such  a way  that 
there  are  equally  many  circles  in  each  column.  Then  we  can  split  each  row  into  two 
rows,  with  the  circled  element  replaced  by  0 and  1.  To  show  that  such  encirclement 
is  possible,  note  that  the  asterisks  of  each  column  can  be  arbitrarily  divided  into  32 
groups  of  7 each;  then  the  512  rows  each  contain  asterisks  of  7 different  groups,  and  the 
32  x 16  = 512  groups  each  appear  in  7 different  rows.  Theorem  7.5. IE  (the  “marriage 
theorem” ) now  guarantees  the  existence  of  a perfect  matching  with  exactly  one  circled 
element  in  each  row  and  each  group. 

References:  R.  L.  Rivest,  SICOMP  5 (1976),  19-50;  A.  E.  Brouwer,  Combina- 
torics, edited  by  Hajnal  and  Sos,  Colloq.  Math.  Soc.  Janos  Bolyai  18  (1978),  173-184. 
Brouwer  went  on  to  prove  that  an  ABD(2n,n)  exists  for  all  n > 32.  The  method  of 
part  (d)  also  yields  an  ABD(32, 15)  when  (13)  is  combined  with  (15). 

19.  By  exercise  8,  the  average  number  with  8 — k specified  bits  is  2fc_3/8-k(8, 8) / (®) , 
which  has  the  respective  values  (32,22,  if1,^,^,^,7|,^,l)Ri  (32,22, 14.9,9.9,6.4, 

4.1. 2.6. 1.6. 1)  for  8 > k > 0.  These  are  only  slightly  higher  than  the  values  of  32*7®  » 
(32,20.7,13.5,8.7,5.7,3.7,2.4,1.5,1).  The  worst-case  numbers  are  (32,22,18,15,11, 

8. 4. 2.1) . 

20.  J.  A.  La  Poutre  [Disc.  Math.  58  (1986),  205-208]  showed  that  an  ABD(m,n) 
cannot  exist  when  m > (”)  and  n > 3;  therefore  no  ABD(16,6)  exists.  La  Poutre 
and  van  Lint  [Util.  Math.  31  (1987),  219-225]  proved  that  there  is  no  ABD(10,5).  We 
get  an  ABD(8, 6)  from  an  ABD(8,  5)  or  ABD(4,3)  using  the  methods  of  exercise  18; 
this  produces  several  nonisomorphic  solutions,  and  additional  examples  of  ABD(8,6) 
might  also  exist.  The  only  remaining  possibilities  (besides  the  trivial  ABD(5, 5)  and 
ABD(6,6))  are  ABD(8, 5)  distinct  from  (15),  and  perhaps  one  or  more  ABD(12,6). 


All  right  — I’m  glad  we  found  it  out  detective  fashion; 

I wouldn't  give  shucks  for  any  other  way. 

— TOM  SAWYER  (1884) 
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Table  1 

QUANTITIES  THAT  ARE  FREQUENTLY  USED  IN  STANDARD  SUBROUTINES 
AND  IN  ANALYSIS  OF  COMPUTER  PROGRAMS  (40  DECIMAL  PLACES) 


y/2  = 1.41421 

35623 

73095 

04880 

16887 

24209  69807 

85697- 

x/3  = 1.73205 

08075 

68877 

29352 

74463 

41505  87236 

69428+ 

y/5  = 2.23606 

79774 

99789 

69640 

91736 

68731  27623 

54406+ 

v^O  = 3.16227 

76601 

68379 

33199 

88935 

44432  71853 

37196- 

\/2  = 1.25992 

10498 

94873 

16476 

72106 

07278  22835 

05703- 

V^3  = 1.44224 

95703 

07408 

38232 

16383 

10780  10958 

83919- 

\f2  = 1.18920 

71150 

02721 

06671 

74999 

70560  47591 

52930- 

In  2 = 0.69314 

71805 

59945 

30941 

72321 

21458  17656 

80755+ 

In  3 = 1.09861 

22886 

68109 

69139 

52452 

36922  52570 

46475- 

In  10  = 2.30258 

50929 

94045 

68401 

79914 

54684  36420 

76011+ 

1/ln  2 = 1.44269 

50408 

88963 

40735 

99246 

81001  89213 

74266+ 

1/lnlO  = 0.43429 

44819 

03251 

82765 

11289 

18916  60508 

22944- 

7T  = 3.14159 

26535 

89793 

23846 

26433 

83279  50288 

41972- 

1° 

= tt/180  = 0.01745 

32925 

19943 

29576 

92369 

07684  88612 

71344+ 

1/tt  = 0.31830 

98861 

83790 

67153 

77675 

26745  02872 

40689+ 

7 r2  = 9.86960 

44010 

89358 

61883 

44909 

99876  15113 

53137- 

= T(l/2)  = 1.77245 

38509 

05516 

02729 

81674 

83341  14518 

27975+ 

T(l/3)  = 2.67893 

85347 

07747 

63365 

56929 

40974  67764 

41287- 

T(2/3)  = 1.35411 

79394 

26400 

41694 

52880 

28154  51378 

55193+ 

e = 2.71828 

18284 

59045 

23536 

02874 

71352  66249 

77572+ 

1/e  = 0.36787 

94411 

71442 

32159 

55237 

70161  46086 

74458+ 

e2  = 7.38905 

60989 

30650 

22723 

04274 

60575  00781 

31803+ 

7 = 0.57721 

56649 

01532 

86060 

65120 

90082  40243 

10422- 

In  7r  — 1.14472 

98858 

49400 

17414 

34273 

51353  05871 

16473- 

4>  = 1.61803 

39887 

49894 

84820 

45868 

34365  63811 

77203+ 

e7  = 1.78107 

24179 

90197 

98523 

65041 

03107  17954 

91696+ 

e7^4  = 2.19328 

00507 

38015 

45655 

97696 

59278  73822 

34616+ 

sin  1 = 0.84147 

09848 

07896 

50665 

25023 

21630  29899 

96226- 

cos  1 = 0.54030 

23058 

68139 

71740 

09366 

07442  97660 

37323+ 

— C'(2)  = 0.93754 

82543 

15843 

75370 

25740 

94567  86497 

78979- 

C(3)  = 1.20205 

69031 

59594 

28539 

97381 

61511  44999 

07650- 

In  <j)  = 0.48121 

18250 

59603 

44749 

77589 

13424  36842 

31352- 

1/ln  tj>  = 2.07808 

69212 

35027 

53760 

13226 

06117  79576 

77422- 

-In  In  2 = 0.36651 

29205 

81664 

32701 

24391 

58232  66946 

94543- 

748 
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Table  2 

QUANTITIES  THAT  ARE  FREQUENTLY  USED  IN  STANDARD  SUBROUTINES 
AND  IN  ANALYSIS  OF  COMPUTER  PROGRAMS  (45  OCTAL  PLACES) 

The  names  at  the  left  of  the  “=”  signs  are  given  in  decimal  notation. 


0.1  = 

0.06314 

0.01  = 

0.00507 

0.001  = 

0.00040 

0.0001  = 

0.00003 

0.00001  = 

0.00000 

0.000001  = 

0.00000 

0.0000001  = 

0.00000 

0.00000001  = 

0.00000 

0.000000001  = 

0.00000 

0.0000000001  = 

0.00000 

72  = 

1.32404 

73  = 

1.56663 

75  = 

2.17067 

7To  = 

3.12305 

72  = 

1.20505 

73  = 

1.34233 

72  = 

1.14067 

In  2 = 

0.54271 

In  3 = 

1.06237 

In  10  = 

2.23273 

1/ln  2 = 

1.34252 

1/lnlO  = 

0.33626 

7T  = 

3.11037 

1°  = tt/180  = 

0.01073 

1/7T  = 

0.24276 

7 T2  = 

11.67517 

75F  = r(l/2)  = 

1.61337 

r(i/3)  = 

2.53347 

T(2/3)  = 

1.26523 

e = 

2.55760 

1/e  = 

0.27426 

e2  = 

7.30714 

7 = 

0.44742 

ln7r  = 

1.11206 

<t>  = 

I.47433 

e7  = 

1.61772 

e"74  = 

2.14275 

sin  1 = 

0.65665 

cos  1 = 

0.42450 

-7(2)  = 

0.74001 

C(3)  = 

1.14735 

ln0  = 

0.36630 

1/ln  <j>  = 

2.04776 

— In  In  2 = 

0.27351 

63146  31463  14631  46314 
53412  17270  24365  60507 
61115  64570  65176  76355 
21556  13530  70414  54512 
24761  32610  70664  36041 
02061  57364  05536  66151 
00153  27745  15274  53644 
00012  57143  56106  04303 
00001  04560  27640  46655 
00000  06676  33766  35367 
74631  77167  46220  42627 
65641  30231  25163  54453 
36334  57722  4 7602  57471 
40726  64555  22444  02242 
05746  15345  05342  10756 
50444  22175  73134  67363 
74050  61556  12455  72152 
02775  75071  73632  57117 
24752  55006  05227  32440 
06735  52524  25405  56512 
16624  53405  77027  35750 
75425  11562  41614  52325 
55242  10264  30215  14230 
72152  11224  72344  25603 
30155  62344  20251  23760 
14467  62135  71322  25561 
61106  64736  6524  7 4 7035 
35234  51013  61316  73106 
57112  14154  74312  54572 
52130  50535  51246  52773 
53066  13167  46761  52726 
45615  23355  33460  63507 
14770  67666  06172  23215 
40443  4 7503  36413  65374 
57156  27751  23701  27634 
13452  61152  65761  22477 
31512  16162  52370  35530 
24436  O4414  73402  03067 
50037  32406  42711  07022 
45144  53253  42362  42107 
00023  60014  20470  15613 
26256  61213  01145  13700 
60111  17144  41512  11436 
71233  67265  63650  1 74 01 


63146  31463  14631  46315- 
53412  17270  24365  60510- 
44264  16254  02030  44672+ 
75170  33021  15002  35223- 
06077  17401  56063  344 1 7- 
55323  07746  44470  26033+ 
12741  72312  20354  02151+ 
47374  77341  01512  63327+ 
12262  71426  40124  21742+ 
55653  37265  34642  01627- 
66115  46725  12575  17435+ 
50265  60361  34073  42223- 
63003  00563  55620  32021- 
57101  41466  33775  22532+ 
65334  25574  22415  03024+ 
76133  05334  3114  7 60121- 
64430  60271  02755  73136+ 
07316  30007  71366  53640+ 
63065  25012  35574  55337+ 
66542  56026  46050  50705+ 
37766  40644  35175  04353+ 
33525  27655  14756  06220- 
63050  56006  70163  21122+ 
54276  63351  22056  11544+ 
47257  50765  15156  70067- 
15466  30021  4065 4 34103- 
40510  15273  34470  1 7762- 
47644  54653  00106  66046- 
37655  60126  23231  02452+ 
42542  00471  7 2363  61661+ 
75436  02440  52371  03355+ 
35040  32664  25356  50217+ 
74376  01002  51313  25521- 
52661  52410  37511  46057+ 
71401  40271  66710  15010+ 
36553  53327  17554  21260+ 
11342  53525  44307  02171- 
23644  H612  07474  14505- 
14666  27320  70675  12321+ 
23350  50074  46100  27706+ 
42561  31715  10177  06614+ 
41004  52264  30700  4O646+ 
16575  00355  43630  40651+ 
56637  26334  31455  57005- 
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Several  interesting  constants  without  common  names  have  arisen  in  con- 
nection with  the  analyses  of  sorting  and  searching  algorithms.  These  constants 
have  been  evaluated  to  40  decimal  places  in  Eqs.  5.2.3  (ig)  and  6.5-(6),  and  in 
the  answers  to  exercises  5.2.3-27,  5.2.4-13,  5.2.4-23,  6.2.2-49,  6. 2. 3-7,  6. 2. 3-8, 
and  6.3-26. 


Table  3 


VALUES  OF  HARMONIC  NUMBERS,  BERNOULLI  NUMBERS, 
AND  FIBONACCI  NUMBERS,  FOR  SMALL  VALUES  OF  n 


n 

Hn 

Bn 

Fn 

n 

0 

0 

1 

0 

0 

1 

1 

-1/2 

1 

1 

2 

3/2 

1/6 

1 

2 

3 

11/6 

0 

2 

3 

4 

25/12 

-1/30 

3 

4 

5 

137/60 

0 

5 

5 

6 

49/20 

1/42 

8 

6 

7 

363/140 

0 

13 

7 

8 

761/280 

-1/30 

21 

8 

9 

7129/2520 

0 

34 

9 

10 

7381/2520 

5/66 

55 

10 

11 

83711/27720 

0 

89 

11 

12 

86021/27720 

-691/2730 

144 

12 

13 

1145993/360360 

0 

233 

13 

14 

1171733/360360 

7/6 

377 

14 

15 

1195757/360360 

0 

610 

15 

16 

2436559/720720 

-3617/510 

987 

16 

17 

42142223/12252240 

0 

1597 

17 

18 

14274301/4084080 

43867/798 

2584 

18 

19 

275295799/77597520 

0 

4181 

19 

20 

55835135/15519504 

-174611/330 

6765 

20 

21 

18858053/5173168 

0 

10946 

21 

22 

19093197/5173168 

854513/138 

17711 

22 

23 

444316699/118982864 

0 

28657 

23 

24 

1347822955/356948592 

-236364091/2730 

46368 

24 

25 

34052522467/8923714800 

0 

75025 

25 

26 

34395742267/8923714800 

8553103/6 

121393 

26 

27 

312536252003/80313433200 

0 

196418 

27 

28 

315404588903/80313433200 

-23749461029/870 

317811 

28 

29 

9227046511387/2329089562800 

0 

514229 

29 

30 

9304682830147/2329089562800 

8615841276005/14322 

832040 

30 
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and, 


For  any  x,  let  Hx 


— ) 
n + xj 


. Then 


#1/2  — 2 — 2 In  2, 

#1/3  = 3-  | In 3, 

#2/3  = | + |tt/>/3-  I In  3, 

#1/4  = 4 - |7r  — 31n2, 

#3/4  = | + §7T  - 3 In  2, 

#1/5  = 5 - i^/25-1/4  - | In  5 - |V51n0, 
#2/5  = I - |^-3/25-i/4  - | In  5+  iV51n^, 
#3/5  = | + §7rr  s/^-1/4  - | In  5 + i V5  In  0, 
#4/5  = | + |7T(/>3/25_1/4  - f In  5 - |\/51n  </>, 
#1/6  = 6 - |7T\/3  - 2 In 2 - |ln3, 

#5/6  = f + §W3  - 21n2  - | In 3, 


in  general,  when  0 < p < q (see  exercise  1.2.9-19), 

tt  __  Q n , P loio  v-"'  2pn  . n 
**p/q  ~ o co^  _7r  — in  2q  + 2 > cos 7 r • In  sin  — 7r. 

P 2 1 9 9 

l<n<g/2  ’ 
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INDEX  TO  NOTATIONS 


In  the  following  formulas,  letters  that  are  not  further  qualified  have  the  following 
significance: 

j,  k integer-valued  arithmetic  expression 
m,  n nonnegative  integer- valued  arithmetic  expression 
x,  y real- valued  arithmetic  expression 

^ complex-valued  arithmetic  expression 

/ real-valued  or  complex-valued  function 

P pointer-valued  expression  (either  A or  a computer  address) 

S,  T set  or  multiset 

a,  (3  strings  of  symbols 


Formal  symbolism 

Meaning 

Where 

defined 

V^E 

give  variable  V the  value  of  expression  E 

1.1 

U<r>V 

interchange  the  values  of  variables  U and  V 

1.1 

An  or  A[n] 

the  nth  element  of  linear  array  A 

1.1 

Amn  or  A [to,  n] 

the  element  in  row  to  and  column  n of  rect- 

angular array  A 

1.1 

NODE(P) 

the  node  (group  of  variables  that  are  indi- 
vidually distinguished  by  their  field  names) 

whose  address  is  P,  assuming  that  P / A 

2.1 

F(P) 

the  variable  in  NODE(P)  whose  field  name  is  F 

2.1 

CONTENTS (P) 

contents  of  computer  word  whose  address  is  P 

2.1 

LOC(V) 

address  of  variable  V within  a computer 

2.1 

P <*=  AVAIL 

set  the  value  of  pointer  variable  P to  the 

address  of  a new  node 

2.2.3 

AVAIL  P 

return  NODE(P)  to  free  storage;  all  its  fields 

lose  their  identity 

2.2.3 

top(S) 

node  at  the  top  of  a nonempty  stack  S 

2.2.1 

X <=  s 

pop  up  S to  X:  set  X <—  top(S);  then  delete 

top(S)  from  nonempty  stack  S 

2.2.1 

S <=  X 

push  down  X onto  S:  insert  the  value  X as 

a new  entry  on  top  of  stack  S 

2.2.1 
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Formal  symbolism 

Meaning 

Where 

defined 

(B  =>  E;  E') 

conditional  expression:  denotes  E if  B is 
true,  E'  if  B is  false 

[B] 

characteristic  function  of  condition  B: 

( B =>■  1;  0) 

1.2.3 

&kj 

Kronecker  delta:  [j  = k ] 

1.2.3 

[zn]9(z) 

coefficient  of  zn  in  power  series  g{z) 

1.2.9 

£/(*) 

sum  of  all  / ( k ) such  that  the  variable  k is  an 

R(k) 

integer  and  relation  R(k)  is  true 

1.2.3 

nm 

product  of  all  f(k)  such  that  the  variable  k 

R(k) 

is  an  integer  and  relation  R(k)  is  true 

1.2.3 

min  f(k) 

minimum  value  of  all  f(k)  such  that  the  var- 

K(k) 

iable  k is  an  integer  and  relation  R(k)  is  true 

1.2.3 

ma xf(k) 

maximum  value  of  all  f(k)  such  that  the  var- 

liyK j 

iable  k is  an  integer  and  relation  R,(k)  is  true 

1.2.3 

j\k 

j divides  k:  k mod  j = 0 and  j > 0 

1.2.4 

S\T 

set  difference:  {a  | a in  5 and  a not  in  T} 

gcd (j,  k) 

greatest  common  divisor  of  j and  k: 

(j  = k = 0 =>  0;  max  d ) 

V d\j,  d\k  ) 

1.1 

j Lk 

j is  relatively  prime  to  k:  gcd  (j,  k)  = 1 

1.2.4 

At 

transpose  of  rectangular  array  A: 

AT[j,k]  = A[k,j] 

aR 

left-right  reversal  of  a 

xv 

x to  the  y power  (when  x is  positive) 

1.2.2 

xk 

x to  the  fcth  power: 

/ \ 

(A:>0=>  Y[  x;  l/x~k\ 

^ 0 <j<k  / 

1.2.2 

xk 

x to  the  k rising:  T(x  + k)/T(x)  = 

/ \ 

[k>  0=>  (x  + j)]  l/(x  + fc)“fcj 

' 0<j<k  / 

1.2.5 

xk- 

x to  the  k falling:  x\/(x  — k)\  = 

/ V 

(k>0=>  JJ  (x-j);  l/(a;  — A)— J 

' 0 <j<k  / 

1.2.5 
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Formal  symbolism 

Meaning 

Where 

defined 

n! 

n factorial:  T(n  + 1)  = n— 

1.2.5 

( n 

a) 

binomial  coefficient:  (k  < 0 =>■  0;  x-/k\) 

multinomial  coefficient  (defined  only  when 

1.2.6 

\ni,n2, . . . ,1 

/ 

U 1 
_ m J 

n = rii  + n2  H h nm) 

Stirling  number  of  the  first  kind: 

1.2.6 

^ ^ kl  k‘2  ■ ■ ■ kn—m 

O^fcl  *£.1^2  fcn  — m *£ 

1.2.6 

{ 

”1 

. m J 

Stirling  number  of  the  second  kind: 

^ ^ k\1C2  • • • kn—m 

1.2.6 

{a  | R(a)} 

set  of  all  a such  that  the  relation  R(a)  is  true 

{*} 

the  set  or  multiset  {a*,  \ 1 < k < n} 

fractional  part  (used  in  contexts  where  a 
real  value,  not  a set,  is  implied):  x — (_xj 

1.2.11.2 

[a..b] 

closed  interval:  {x  \ a < x < b} 

1.2.2 

( a-.b ) 

open  interval:  {x  \ a < x < b} 

1.2.2 

[a..b) 

half-open  interval:  {re  | a < x < b} 

1.2.2 

(' 

a . . b] 

|S| 

\x\ 

M 

half-closed  interval:  {x  \ a < x < b} 

cardinality:  the  number  of  elements  in  set  S 

absolute  value  of  a::  (x  > 0 =>  x;  —x) 

length  of  a 

1.2.2 

L*J 

floor  of  x,  greatest  integer  function:  ma Xk<xk 

1.2.4 

r*i 

ceiling  of  x,  least  integer  function:  minfc>x  k 

1.2.4 

x mod  y 

mod  function:  (y  — 0 =>  x;  x — y[x/y\) 

1.2.4 

x = x'  (modulo  y) 

relation  of  congruence:  x mod  y — x'  mod  y 

1.2.4 

0(f(n)) 

big-oh  of  /(n),  as  the  variable  n — > oo 

1.2.11.1 

0{f(z)) 

big-oh  of  f(z),  as  the  variable  z — ¥ 0 

1.2.11.1 

«(/(«)) 

big-omega  of  f(n),  as  the  variable  n — > oo 

1.2.11.1 

©(/(»)) 

big-theta  of  /(n),  as  the  variable  n — » oo 

1.2.11.1 
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Formal  symbolism 

Meaning 

Where 

defined 

l°g&  x 

logarithm,  base  fc,  of  a;  (when  a;  > 0, 

b > 0,  and  5 / 1):  the  y such  that  x = bv 

1.2.2 

In  a: 

natural  logarithm:  loge  x 

1.2.2 

Igx 

binary  logarithm:  log2  x 

1.2.2 

expx 

exponential  of  a;:  ex 

1.2.9 

(Xn) 

the  infinite  sequence  Xo,  X\,  X2,  ... 

(here  the  letter  n is  part  of  the  symbolism) 

1.2.9 

/'(*) 

derivative  of  / at  x 

1.2.9 

/"(*) 

second  derivative  of  / at  a: 

1.2.10 

/(n)(a) 

nth  derivative:  (n  = 0 =>  /(a;);  g'(x)), 

where  g(x)  = f(-n~1\x) 

1.2.11.2 

harmonic  number  of  order  x:  l/kx 

1 <k<n 

1.2.7 

Hn 

harmonic  number:  Hn  ^ 

1.2.7 

Fn 

Fibonacci  number: 

(n  ^ 1 — V n;  Fn—\  + Fn_2) 

1.2.8 

Bn 

Bernoulli  number:  n!  [zn]  z/(ez  — 1) 

1.2.11.2 

det(A) 

determinant  of  square  matrix  A 

1.2.3 

sign  (a:) 

sign  of  x : [x  > 0]  — [a:  < 0] 

C(x) 

zeta  function:  limn_>.oo  Hr»x  1 (when  x > 1) 

1.2.7 

F(x) 

gamma  function:  (a;  — 1)!  = ^y(x,  oo) 

1.2.5 

7 (x,2/) 

incomplete  gamma  function:  e_ttx_1dt 

1.2.11.3 

7 

Euler’s  constant:  limn_>00(H„  — Inn) 

1.2.7 

e 

base  of  natural  logarithms:  Yln>o  1/n! 

1.2.2 

7T 

circle  ratio:  4 )T)n>0(-l)"/(2n  + 1) 

1.2.2 

OO 

infinity:  larger  than  any  number 

A 

null  link  (pointer  to  no  address) 

2.1 

e 

empty  string  (string  of  length  zero) 

0 

empty  set  (set  with  no  elements) 

<t> 

golden  ratio:  |(l  + \/5) 

1.2.8 

Euler’s  totient  function:  [k  A n] 

0</c<n 

1.2.4 

x^y 

x is  approximately  equal  to  y 

1.2.5 
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Formal  symbolism 

Meaning 

Where 

defined 

Pr(S(X)) 

probability  that  statement  S(X)  is  true,  for 
random  values  of  X 

1.2.10 

EX 

expected  value  of  X:  ^2X  x Pr(X  = x) 

1.2.10 

mean(g) 

mean  value  of  the  probability  distribution 
represented  by  generating  function  g:  g'(l) 

1.2.10 

var (g) 

variance  of  the  probability  distribution 
represented  by  generating  function  g: 
g"(l)+g'(l)-g'(l)> 

1.2.10 

(min  xi,  ave  X2, 

max  X3,  dev  X4) 

a random  variable  having  minimum 
value  xi,  average  (expected)  value  x2, 
maximum  value  X3,  standard  deviation  X4 

1.2.10 

K z 

real  part  of  z 

1.2.2 

imaginary  part  of  z 

1.2.2 

z 

complex  conjugate:  ?Rz  — i 

1.2.2 

(. . . aido-a-i  • • • )t 

radix-5  positional  notation:  cikbk 

4.1 

//xil  X2y  • * • , XnJf 

continued  fraction: 

1/ (»!  + 1/(X2  + l/(  • • • + 1/(X„)  ...))) 

4.5.3 

aj/3 

intercalation  product 

5.1.2 

SWT 

multiset  sum;  e.g.,  {a,  5}l±l{a,  c}  = {a,  a,  b,  c} 

4.6.3 

/(*)£ 

function  growth:  f(b)  — f(a) 

1 

end  of  algorithm,  program,  or  proof 

1.1 

U 

one  blank  space 

1.3.1 

rA 

register  A (accumulator)  of  MIX 

1.3.1 

rX 

register  X (extension)  of  MIX 

1.3.1 

rll, . . . , rI6 

(index)  registers  11,  . . . , 16  of  MIX 

1.3.1 

rJ 

(jump)  register  J of  MIX 

1.3.1 

(L:R) 

partial  field  of  MIX  word,  0 < L < R < 5 

1.3.1 

OP  ADDRESS, 1(F) 

notation  for  MIX  instruction 

1.3.1,  1.3.2 

u 

unit  of  time  in  MIX 

1.3.1 

* 

“self”  in  MIXAL 

1.3.2 

OF,  IF,  2F,  . . . , 9F 

“forward”  local  symbol  in  MIXAL 

1.3.2 

OB,  IB,  2B,  . . . , 9B 

“backward”  local  symbol  in  MIXAL 

1.3.2 

OH,  1H,  2H,  . . . , 9H 

“here”  local  symbol  in  MIXAL 

1.3.2 
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One  of  my  mathematician  friends  told  me  he  would  be  willing 
to  recognize  computer  science  as  a worthwhile  field  of  study 
as  soon  as  it  contains  1,000  deep  theorems. 
This  criterion  should  obviously  be  changed  to  include  algorithms 
as  well  as  theorems — say  500  deep  theorems  and  500  deep  algorithms. 
But  even  so,  it  is  clear  that  computer  science  today  doesn’t  measure  up 
to  such  a test,  if  “deep"  means  that  a brilliant  person  would  need 
many  months  to  discover  the  theorem  or  the  algorithm. 

. . . The  potential  for  “1,000  deep  results"  is  there, 
but  only  perhaps  50  have  been  discovered  so  far. 
— DONALD  E.  KNUTH,  Computer  Science  and  Mathematics  (1973) 
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If  you  don 't  find  it  in  the  Index, 
look  very  carefully  through  the  entire  catalogue. 

— SEARS,  ROEBUCK  AND  CO.,  Consumers  Guide  (1897) 

When  an  index  entry  refers  to  a page  containing  a relevant  exercise,  see  also  the  answer  to 
that  exercise  for  further  information.  An  answer  page  is  not  indexed  here  unless  it  refers  to  a 
topic  not  included  in  the  statement  of  the  exercise. 


-oo,  4,  142-144,  156,  214,  663-664, 

685,  707. 

0-1  matrices,  660. 

0-1  principle,  223,  224,  245,  667,  668. 

1/3-2/3  conjecture,  197. 

2-3  trees,  476-477,  480,  483,  715. 

(2,4)-trees,  477. 

2-d  trees,  565. 

2-descending  sequence,  451. 

2-ordered  permutations,  86-88,  103, 

112-113,  134. 

80-20  rule,  400-401,  405,  456. 

oo,  4,  138-139,  257-258,  263,  521, 

624-625,  646. 

as  sentinel,  159,  252,  308,  324. 

C(x)  (number  of  0s),  235;  see  also 
Zeta  function. 

v(x)  (number  of  Is),  235,  643,  644,  717. 

n (circle  ratio),  372,  520,  748-749. 
as  “random”  example,  17,  370,  385, 

547,  552,  733. 

4>  (golden  ratio),  xiv,  138,  517-518,  748-749. 

(a,  b)- trees,  477. 

Abbreviated  keys,  512,  551. 

Abel,  Niels  Henrik,  binomial  formula,  552. 
limit  theorem,  740. 

Abraham,  Chacko  Thakadiparambil,  578. 

Absorption  laws,  239. 

Adaptive  sorting,  389. 

Addition  of  apples  to  oranges,  401. 

Addition  of  polynomials,  165. 

Addition  to  a list,  see  Insertion. 

Address  calculation  sorting,  99-102, 

104-105,  176-177,  380,  389,  698. 

Address  table  sorting,  74-75,  80. 

Adelson- Velsky,  Georgii  Maximovich 
(ArrejibcoH-BejibCKHH,  Teoprafi 
MaKCHMOBHu),  459,  460. 

Adjacent  transpositions,  13,  240,  403, 

404,  640,  668. 

Adversaries,  198-202,  205-207,  209-210, 

218,  671. 

AF-heaps,  152. 

Agarwal,  Ramesh  Chandra  ('Osi  ^T 
3UJ4M),  359,  389. 

Agenda,  see  Priority  queue. 

Aggarwal,  Alok  (STRTFfT  3TTT4 I H ) , 698. 


Aho,  Alfred  Vaino,  476,  479,  652. 

Aigner,  Martin,  241. 

Airy,  George  Biddle,  function,  611. 

Ajtai,  Miklos,  228,  673,  740. 

al-Khwarizml,  Abu  ‘Abd  Allah 
Muhammad  ibn  Musa 

oi  <li I ^1),  8- 

Aldous,  David  John,  728. 

Alekseev,  Vladimir  Evgenievich  (AneKceen, 
BjiaflHMnp  EBreHbCBHu),  232,  233, 

237,  238. 

Alexanderson,  Gerald  Lee,  599. 

ALGOL  language,  454. 

Algorithms,  analysis  of,  see  Analysis, 
comparison  of,  see  Comparison, 
proof  of,  see  Proof. 

Allen,  Brian  Richard,  478. 

Allen,  Charles  Grant  Blairfindie,  558. 

Alphabetic  binary  encoding,  452-454. 

Alphabetic  order,  7,  420-421,  453. 

Altenkamp,  Doris,  713. 

Alternating  runs,  46,  607. 

Amble,  Ole,  556. 

Amdahl,  Gene  Myron,  547. 

American  Library  Association  rules,  7-8. 

AMM:  American  Mathematical  Monthly, 
published  by  the  Mathematical 
Association  of  America  since  1894. 

Amortized  cost,  478,  549. 

Amphisbaenic  sort,  347,  388. 

Anagrams,  9,  see  also  Permutations 
of  a multiset. 

Analysis  of  algorithms,  3,  77-78,  80,  82, 
85-95,  100-105,  108-109,  118-122, 

140,  152-158,  161-162,  167-168, 
174-177,  185-186,  255-256,  259-266, 
274-279,  285-287,  294-299,  330-335, 
339-343,  379,  382,  387-388,  397-408, 
412-413,  424-425,  430-431,  454-458, 
466-471,  479-480,  485-486,  490, 
500-512,  524-525,  534-539,  543-544, 
552-557,  565-566,  576,  619,  see  also 
Complexity  analysis. 

Analytical  Engine,  180. 

AND  (bitwise  and),  111,  134,  531,  589, 

592,  629. 

Andre,  Antoine  Desire,  68,  605. 

Anti-stable  sorting,  347,  615,  650. 

Antisymmetric  function,  66. 
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Anuyogadvarasutra  "P ),  23. 

Apollonius  Sophista,  son  of  Archibius 
( A.7toXX<ivioc  6 Eocpio-crig,  too 
Apx'.piou),  421. 

Appell,  Paul  Emile,  679. 

Approximate  equality,  9,  394-395. 

Aragon,  Cecilia  Rodriguez,  478. 

Archimedes  of  Syracuse  ( ’Apxip.'nSi'K 
6 Eupaxouaio?),  13. 
solids,  593. 

Arge,  Lars  Allan,  489. 

Argument,  392. 

Arisawa,  Makoto  574. 

Arithmetic  overflow,  6,  519,  585. 

Arithmetic  progressions,  517. 

Armstrong,  Philip  Nye,  225,  244,  245,  675. 
Arora,  Sant  Ram  (ei-fl  TTR  347751),  455. 
Arpaci-Dusseau,  Andrea  Carol,  390. 
Arpaci-Dusseau,  Remzi  Hussein,  390. 

Ascents  of  a permutation,  35. 

Ashenhurst,  Robert  Lovett,  344,  348. 

Askey,  Richard  Allen,  601. 

Associated  Stirling  numbers,  266. 

Associative  block  designs,  574-575,  582. 
Associative  law,  24,  35,  239,  461,  592. 
Associative  memories,  392,  579. 

Asymptotic  methods,  41-42,  45,  47, 

62-64,  69,  128-134,  136-138,  194-195, 
286-287,  405,  479,  490,  504-506, 
509-510,  555-557,  644. 
limits  of  applicability,  318. 

Attitude,  73. 

Attributes,  559. 
binary,  567—576. 
compound,  564,  566-567. 
auf  der  Heide,  see  Meyer  auf  der  Heide. 
Automatic  programming,  387. 

AVL  trees,  459,  see  Balanced  trees. 

Avni,  Haim  D«n),  707. 

B -trees,  482-491,  549,  563. 

B ' -trees.  486. 

B*- trees,  488. 

Babbage,  Charles,  180. 

Baber,  Robert  Laurence,  704. 

Babylonian  mathematics,  420. 

Bachrach  (=  Gilad-Bachrach),  Ran 
(fm-TVS)  p),  403. 

Backward  reading,  see  Read-backward. 
Baeza- Yates,  Ricardo  Alberto,  489,  713,  715. 
Bafna,  Vineet  «iihhi),  615. 

Baik,  Jinho  611. 

Bailey,  Norman  Thomas  John,  740. 

Balance  factor,  459,  479. 

Balanced  filing,  576—578,  581. 

Balanced  incomplete  block  designs,  576. 
Balanced  merging,  248-251,  267,  297, 

299-300,  311,  325,  333,  386-387,  587. 
with  rewind  overlap,  297. 

Balanced  radix  sorting,  343,  386. 


Balanced  trees,  150-151,  458-491, 

547,  592,  713. 
weight-balanced,  476,  480. 

Balancing  a binary  tree,  480. 

Balancing  a fc-d  tree,  566. 

Balbine,  Guy  de,  528. 

Ball,  Walter  William  Rouse,  593. 

Ballot  problem,  61,  66. 

Barnett,  John  Keith  Ross,  168. 

Barton,  David  Elliott,  44,  602,  603,  605. 
Barve,  Rakesh  Dilip  (TT%5T  feftT  ^f"),  371 
Barycentric  coordinates,  437. 

Basic  query,  569,  574-576,  579-582. 
Batcher,  Kenneth  Edward,  111,  223,  226, 
230-232,  235,  381,  389,  667. 

Batching,  98,  159,  560,  563. 

Baudet,  Gerard  Maurice,  671. 

Bayer,  Paul  Joseph,  454,  458. 

Bayer,  Rudolf,  477,  482,  483,  487,  490,  721 
Bayes,  Anthony  John,  346. 

Bell,  Colin  James,  337. 

Bell,  David  Arthur,  167,  388,  647. 

Bell,  James  Richard,  531,  532. 

Bellman,  Richard  Ernest,  ix. 

Ben-Amram,  Amir  Mordechai 

(oiny-p  oiid  ton),  181. 

Bencher,  Dennis  Leo,  312,  313,  316. 
Benchmarks,  389-391. 

Bender,  Edward  Anton,  605,  609,  696. 
Bennett,  Brian  Thomas,  378. 

Bennett,  Mary  Katherine,  718-719. 

Bent,  Samuel  Watkins,  213,  478,  666. 
Bentley,  Jon  Louis,  122,  403,  512, 

565-566,  633,  635. 

Bergeron,  Anne,  615. 

Berkeley,  George,  782. 

Berkovich,  Simon  Yakovlevich  (EepKOBHH, 
CeMeH  RKOBjieBHu),  496. 

Berman,  Joel  David,  669. 

Berners-Lee,  Conway  Maurice,  98,  453. 
Bernoulli,  Jacques  (=  Jakob  = James), 
numbers,  506,  602,  637,  750. 
numbers,  calculation  of,  611. 

Berra,  Lawrence  Peter  “Yogi”,  476. 
Bertrand,  Joseph  Louis  Frangois,  605. 
Best-fit  allocation,  480. 

Best  possible,  180. 

Beta  distribution,  586. 

Betz,  Bernard  Keith,  268,  288. 

Beus,  Hiram  Lynn,  245-246. 

Bhaskara  II,  Acarya,  son  of  Mahesvara 

(hip6tni4  , Rt’rppr),  23. 

Bhattacharya,  Kailash  Nath  iTW 

ofltfisfj),  746. 

Biased  trees,  478. 

Bienayme,  Irenee  Jules,  605. 

Bierce,  Ambrose  Gwinnett,  558. 

BINAC  computer,  386. 

Binary  attributes,  567-576. 

Binary  computers,  411. 
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Binary  insertion  sort,  82-83,  97,  183, 

186,  193,  225,  386. 

Binary  merging,  203-204,  206. 

Binary  quicksort,  see  Radix  exchange. 
Binary  recurrences,  135,  167,  630,  644,  653. 
Binary  search,  82,  203,  409-417,  420, 
422-423,  425,  435,  546,  643. 
uniform,  414-416,  423. 

Binary  search  trees,  426-481,  565. 
optimum,  436-454,  456-458,  478. 
pessimum,  457,  711. 

Binary  tree:  Either  empty,  or  a root 
node  and  its  left  and  right  binary 
subtrees;  see  also  Complete  binary 
tree,  Extended  binary  tree, 
enumeration,  60-61,  157,  295,  467,  479. 
triply  linked,  158,  475. 

Binary  tries,  500-502. 

Binomial  coefficients,  30-31,  87,  190. 
Binomial  probability  distribution,  100-101, 
341,  539,  555. 

Binomial  queues,  152. 

Binomial  transforms,  136-137,  508. 
Biquinary  number  system,  694. 

Birkhoff,  Garrett,  719. 

Birthday  paradox,  513,  549. 

Bisection,  410,  see  Binary  search. 

BIT:  Nordisk  Tidskrift  for  Informations- 
Behandling,  an  international  journal 
published  in  Scandinavia  since  1961. 
Bit  reversal,  621. 

Bit  strings,  561-562,  572-573. 

Bit  vectors,  235. 

Bitner,  James  Richard,  403,  478,  703. 
Bitonic  sequence,  231. 

Bitonic  sorter,  230-232,  243,  244. 

Bits  of  information,  183,  442-443. 

Bitwise  and,  111,  134,  531,  589,  592,  629. 
Bitwise  or,  529,  571. 

Bjorner,  Anders,  609. 

Blake,  Ian  Eraser,  740. 

Blanks,  algebra  of,  592. 

Bleier,  Robert  Edward,  578. 

Block  designs,  576-578. 

Blocks  of  records,  258. 
on  disk,  358-359,  369. 
on  tape,  318-320. 

Bloom,  Burton  Howard,  572-573,  578,  745. 
Blum,  Manuel,  214. 

Blum,  Norbert  Karl,  718. 

Boas,  Peter  van  Emde,  152. 

Boehm  McGraw,  Elaine  Marie  [=  Elaine 
Marie  Hall],  547. 

Boerner,  Hermann,  669. 

Boesset,  Antoine  de,  24. 

Bollobas,  Bela,  645. 

Book  of  Creation  (flVSP  130),  23. 

Boolean  queries,  559,  562,  564. 

Booth,  Andrew  Donald,  396,  400, 

422,  453,  454. 


Boothroyd,  John,  617. 

Borwein,  Peter  Benjamin,  155. 

Bose,  Raj  Chandra  (^fsi  226, 

578,  746. 

Bostic,  Keith,  177,  652. 

Bottenbruch,  Hermann,  422,  425. 
Bouricius,  Willard  Gail,  195,  223. 

Bourne,  Charles  Percy,  395,  578. 
Brandwood,  Leonard,  400. 

Bravais,  Auguste,  518. 

Bravais,  Louis,  518. 

Brawn,  Barbara  Severa,  698. 

Breaux,  Nancy  Ann  Eleser,  680. 

Brent,  Richard  Peirce,  532-533,  546,  718. 
Briandais,  Rene  Edward  de  la,  494. 
Brouwer,  Andries  Evert,  575,  747. 

Brown,  John,  7. 

Brown,  Mark  Robbin,  152,  479. 

Brown,  Randy  Lee,  152. 

Brown,  William  Stanley,  157. 

Bruhat,  Francois,  order,  628.  670. 

weak,  13,  19,  22,  628,  670. 

Bruijn,  Nicolaas  Govert  de,  130,  138, 

602,  668,  670,  671,  744. 

Bubble  sort,  106-109,  128-130,  134, 

140,  222-223,  240,  244,  246-247, 
348-349,  380,  387,  390. 
multihead,  244-245. 

Buchholz,  Werner,  396,  548. 

Buchsbaum,  Adam  Louis,  481. 

Bucket  sorting,  169. 

Buckets,  541-544,  547-548,  555,  564. 
Buffering,  339-343,  387,  488. 

size  of  buffers,  332-333,  349,  360, 
367-368,  376-377. 

Bulk  memory,  356,  see  Disk  storage. 
Burge,  William  Herbert,  279,  297,  337. 
Burkhard,  Walter  Austin,  747. 

Burton,  Robert,  v. 

Butterfly  network,  227,  236-237. 

C language,  426. 

Cache  memory,  389. 

CACM:  Communications  of  the  ACM , 
a publication  of  the  Association  for 
Computing  Machinery  since  1958. 
Calendar  queues,  152. 

Cancellation  laws,  24. 

Canfield,  Earl  Rodney,  673. 

Cards,  see  also  Playing  cards, 
edge-notched,  1,  569-570,  578. 
feature,  569-570,  578. 
machines  for  sorting,  169-170,  175, 
383-385. 

Carlitz,  Leonard,  39,  47,  613,  620. 

Caron,  Jacques,  279,  280,  286,  287. 
Carries,  691. 

Carroll,  Lewis  (=  Dodgson,  Charles 
Lutwidge),  207-208,  216,  584. 

Carter,  John  Lawrence,  519,  557,  743. 
Carter,  William  Caswell,  279,  288,  297. 
Cartesian  trees,  478. 


762  INDEX  AND  GLOSSARY 


Cascade  merge,  288-300,  311,  326, 

333,  338,  389. 
read-backward,  328,  334. 
with  rewind  overlap,  299,  327, 

333-334,  342. 

Cascade  numbers,  294-299. 

Cascading  pseudo-radix  sort,  347. 

Catalan,  Eugene  Charles,  numbers,  61,  295. 
Catenated  search,  407. 

Cawdrey  (=  Cawdry),  Robert,  421. 

Cayley,  Arthur,  628,  653. 

Celis  Villegas,  Pedro,  741. 

Cells,  564. 

Census,  383-386,  395. 

Cesari,  Yves,  193,  279. 

Chaining,  520-525,  542-544,  547,  553,  557. 

to  reduce  seek  time,  368-369. 

Chakravarti,  Gurugovinda  ('S3T5||l<Hi 
23. 

Chandra,  Ashok  Kumar  (STSffaf  4)0 1 ' 

^T),  422. 

Chang,  Shi-Kuo  (IjH^IIS),  458. 

Chartres,  Bruce  Aylwin,  156. 

Chase,  Stephen  Martin,  196. 

Chazelle,  Bernard  Marie,  583. 

Chebyshev,  Pafnutii  Lvovich  (HeObiineB, 
Ila4>HyTHH  JIbBOBHu),  395. 
polynomials,  296,  685. 

Chen,  Wen-Chin  (PjjtjJt),  548. 

Cherkassky,  Boris  Vasilievich  (HepKaccKH®, 
Bopnc  BacHJibeBHu),  152. 

Chessboard,  14,  46,  69. 

Chinese  mathematics,  36. 

Choice  of  data  structure,  95-96,  141, 
151-152,  163-164,  170-171,  459, 
561-567. 

Chow,  David  Kuo-kien,  578. 

Christen,  Claude  Andre,  204,  658. 
Chronological  order,  372,  379. 

Chung,  Moon  Jung  = 15AW),  673. 

Chung  Graham,  Fan  Rong  King 
402. 

Church,  Randolph,  669. 

Cl:  Mix’s  comparison  indicator,  6. 

Cichelli,  Richard  James,  513. 

Circular  lists,  407,  729. 

Ciura,  Marcin  Grzegorz,  95,  623. 

Clausen,  Thomas,  157. 

Cleave,  John  Percival,  400. 

Clement,  Julien  Stephane,  728. 

Cliques,  9. 

Closest  match,  search  for,  9,  394,  408, 

563,  566,  581. 

CMath:  Concrete  Mathematics,  a book 
by  R.  L.  Graham,  D.  E.  Knuth, 
and  O.  Patashnik. 

CMPA  (compare  rA),  585. 

Coalesced  chaining,  521-525,  543,  548, 
550-554,  557,  730. 

COBOL  language,  339,  532. 


Cocktail-shaker  sort,  110,  134,  356,  676,  694. 
Codes  for  difficulty  of  exercises,  ix-xi. 
Coffman,  Edward  Grady,  Jr.,  496. 

Coldrick,  David  Blair,  638. 

Cole,  Richard  John,  583. 

Colin,  Andrew  John  Theodore,  453,  454. 
Collating,  158,  385-387,  see  Merging. 
Collating  sequence,  7,  420-421. 

Collision  resolution,  514,  520-557. 

Column  sorting,  343. 

Combinatorial  hashing,  573-575,  579-580, 
582,  746. 

Combinatorial  number  system,  573. 

Comer,  Douglas  Earl,  489. 

Commutative  laws,  239,  455. 

Comp.  J.:  The  Computer  Journal,  a 

publication  of  the  British  Computer 
Society  since  1958. 

Comparator  modules,  221,  234,  241. 
Comparison  counting  sort,  75-80,  382,  387. 
Comparison-exchange  tree,  196. 

Comparison  matrix,  188. 

Comparison  of  algorithms,  151,  324-338, 
347-348,  380-383,  471,  545-547. 
Comparison  of  keys,  4. 

minimizing,  180-247,  413,  425,  549. 
multiprecision,  6,  136,  169. 
parallel,  113,  222-223,  228-229,  235, 

390,  425,  671. 

searching  by,  398-399,  409-491,  546-547. 
sorting  by,  80-122,  134-168,  180-197, 
219-343,  348-383. 

Comparison  trees,  181-182,  192-197, 

217,  219-220,  411-417. 

Compiler  techniques,  2-3,  426,  532. 
Complement  notations,  177. 

Complementary  pairs,  9. 

Complemented  block  designs,  581. 

Complete  binary  trees,  144,  152-153,  158, 
211,  217,  253-254,  258,  267,  425. 
Complete  P- ary  tree,  361,  697. 

Complete  ternary  trees,  157. 

Complex  partitions,  21. 

Complexity  analysis  of  algorithms,  168, 
178-179,  180-247,  302-311,  353-356, 
374-378,  388,  412-413,  425,  491, 
539-541,  549,  578. 

Components  of  graphs,  189. 

Compositions,  286-287. 

Compound  attributes,  564,  566-567. 
Compound  leaf  of  a tree,  688. 

Compressed  tries,  507. 
dynamic,  722. 

Compression  of  data,  453,  512. 

Compromise  merge,  297. 

Computational  complexity,  see  Complexity. 
Computational  geometry,  566. 

Computer  operator,  skilled,  325,  337,  349. 
Computer  Sciences  Corporation,  2. 

Comrie,  Leslie  John,  170,  385. 

Concatenation  of  balanced  trees,  474,  479. 
Concatenation  of  linked  lists,  172. 
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Concave  functions,  443,  456,  458. 

Concurrent  access,  491. 

Conditional  expressions,  753. 

Connected  graphs,  189,  733,  742. 

Consecutive  retrieval,  567,  579. 

Convex  functions,  366,  375. 

Convex  hulls,  478,  670. 

Cookies,  567-571,  577. 

Coordinates,  564-566. 

Copyrights,  iv,  387. 

Corless,  Robert  Malcolm,  606. 

Cormen,  Thomas  H.,  477. 

Coroutines,  259. 

Cotangent,  194. 

Counting,  sorting  by,  75-80. 

Covering,  235. 

Coxeter,  Harold  Scott  MacDonald,  593. 
Cramer,  Gabriel,  11. 

Cramer,  Michael,  650. 

Crane,  Clark  Allan,  149-150,  152,  474, 

475,  479,  716. 

Crelle:  Journal  fiir  die  reine  und  angewandte 
Mathematik , an  international  journal 
founded  by  A.  L.  Crelle  in  1826. 
Criss-cross  merge,  312-315,  317. 
Cross-indexing,  see  Secondary  key  retrieval. 
Cross-reference  routine,  7. 

Crossword-puzzle  dictionary,  573. 

Cube,  n-dimensional,  linearized,  408. 
Culberson,  Joseph  Carl,  435. 

Culler,  David  Ethan,  390. 

Cundy,  Henry  Martyn,  593. 

Cunto  Pucci,  Walter,  218. 

Curtis,  Pavel,  251. 

Cycles  of  a permutation,  25-32,  62,  156, 

617,  628,  639-640,  657. 

Cyclic  occupancy  problem,  379. 

Cyclic  rotation  of  data,  619. 

Cyclic  single  hashing,  556-557. 

Cylinders  of  a disk,  357,  376,  482,  489,  562. 
Cypher,  Robert  Edward,  623. 

Czech,  Zbigniew  Janusz,  513. 

Czen  Ping  (f£3]z'),  186. 

Daly,  Lloyd  William,  421. 

Dannenberg,  Roger  Berry,  583. 

Data  compression,  453,  512. 

Data  structure,  choice  of,  95-96,  141, 
151-152,  163-164,  170-171,  459, 
561-567. 

Database,  392. 

David,  Florence  Nightingale,  44,  602,  605. 
Davidson,  Leon,  395. 

Davies,  Donald  Watts,  388. 

Davis,  David  Robert,  578. 

Davison,  Gerald  A.,  152. 

de  Balbine,  Guy,  528. 

de  Bruijn,  Nicolaas  Govert,  130,  138, 

602,  668,  670,  671,  744. 
de  la  Briandais,  Rene  Edward,  494. 


de  Peyster,  James  Abercrombie,  Jr.,  544. 
de  Stael,  Madame,  see  Stael-Holstein. 
Deadlines,  407. 

Deadlocks,  721. 

Debugging,  520. 

Decision  trees,  181-182,  192-197,  217, 
219-220,  411-417,  443-444. 

Dedekind,  Julius  Wilhelm  Richard,  239. 
sums,  20. 

Degenerate  trees,  430,  454,  711. 

Degenerative  addresses,  547. 

Degree  path  length,  363-367. 

Degrees  of  freedom,  258-259. 

Deift,  Percy  Alec,  611. 

Deletion:  Removing  an  item, 
from  a B-tree,  490. 
from  a balanced  tree,  473,  479. 
from  a binary  search  tree,  431-435, 

455,  458. 

from  a digital  search  trees,  508. 
from  a hash  table,  533-534,  548-549, 

552,  556,  741. 
from  a heap,  157. 
from  a leftist  tree,  158. 
from  a multidimensional  tree,  581. 
from  a trie,  507. 

Demuth,  Howard  B.,  109,  184,  246,  348, 

353,  387,  388,  676. 

Den,  Vladimir  Eduardovich 

BjiaflHMHp  3 Ayap,iiOBnu) , 7. 

Denert,  Marlene,  596. 

Dent,  Warren  Thomas,  455. 

Derangements,  679. 

Derr,  John  Irving,  547. 

Descents  of  a permutation,  35,  46,  47,  606. 
Determinants,  11,  14,  19,  33-34. 

Vandermonde,  59,  610,  729. 

Deutsch,  David  Nachman,  204. 

Devroye,  Luc  Piet-Jan  Arthur,  565, 

713,  721,  728. 

Diaconis,  Persi  Warren,  597. 

Diagram  of  a partial  order,  61-62, 

183-184,  187. 

Dictionaries  of  English,  1-2,  421,  558,  589. 
Dictionary  order,  5. 

Dietzfelbinger,  Martin  Johannes,  549. 

Digital  search  trees,  502-505,  508-511, 

576,  646. 
optimum,  511. 

Digital  searching,  492-512. 

Digital  sorting,  169,  343,  see  Radix  sorting. 
Digital  tree  search,  496-498,  517,  546-547. 
Dijkstra,  Edsger  Wybe,  636. 

Dilcher,  Karl  Heinrich,  726. 

Diminishing  increment  sort,  84. 

Dinsmore,  Robert  Johe,  258. 

Direct-access  memory,  356,  see  Disk  storage. 
Direct  sum  of  graphs,  189-191. 

Directed  graphs,  9,  61-62,  184. 

Discrete  entropy,  374-375. 

Discrete  logarithms,  10. 

Discrete  system  simulation,  149. 
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Discriminant,  59,  66,  68. 

Disk  storage,  357-379,  389-390,  407, 
481-491,  562-563. 

Disk  striping,  370,  378,  389. 

Disorder,  measures  of,  11,  22,  72,  134,  389. 
Displacements,  variance  of,  556,  619. 
Distribution  counting  sort,  78-80,  99, 

170,  176-177,  380-382. 

Distribution  functions,  105,  see  Probability. 
Distribution  patterns,  343-348. 

Distribution  sorting,  see  Radix  sorting. 
Distributive  laws,  239. 

Divide  and  conquer,  175. 

recurrence,  168,  224,  674. 

Divisor  function  d(n),  138. 

Dixon,  John  Douglas,  611. 

DNA,  34,  72. 

Dobkin,  David  Paul,  583. 

Dobosiewicz,  Wlodzimierz,  176,  266, 

628,  680. 

Dodd,  Marisue,  520. 

Dodgson,  Charles  Lutwidge,  207,  see  Carroll. 
Dor,  Dorit  (in  Jinn),  664. 

Doren,  David  Gerald,  212,  218. 

Dot  product,  406. 

Double-entry  bookkeeping,  561. 

Double  hashing,  528-533,  546,  548, 

551-552,  556,  557,  742. 

Double  rotation,  461,  464,  477. 

Doubly  exponential  sequences,  467,  715. 
Doubly  linked  list,  393,  375,  543,  646,  713. 
Douglas,  Alexander  Shafto,  98,  388,  396. 
Dowd,  Martin  John,  673. 

Drake,  Paul,  1. 

Driscoll,  James  Robert,  152,  583. 

Drmota,  Michael,  713. 

Dromey,  Robert  Geoffrey,  634. 

Drum  storage,  359-362. 

Drysdale,  Robert  Lewis  (Scot),  III,  228. 

Dual  of  a digraph,  191. 

Dual  tableaux,  56-57,  69. 

Dubost,  Pierre,  747. 

Dudeney,  Henry  Ernest,  589,  670. 

Dugundji,  James,  245. 

Dull,  Brutus  Cyclops,  6,  45,  549. 

Dumas,  Philippe,  134. 

Dumey,  Arnold  Isaac,  255,  396,  422, 

453,  547. 

Dummy  runs,  248-249,  270-272,  276, 
289-293,  299,  302,  312,  316-317, 

682,  686. 

Dumont,  Dominique,  605. 

Duplication  of  code,  398,  418,  429, 

625,  648,  677. 

Dutch  national  flag  problem,  636. 

Dwyer,  Barry,  574. 

Dynamic  programming,  ix,  438. 

Dynamic  searching,  393. 

Dynamic  storage  allocation,  11,  480. 


e,  41,  526,  748-749,  755. 

Ebbenhorst  Tengbergen,  Cornelia  van,  744. 
Eckert,  John  Presper,  386-387. 

Eckler,  Albert  Ross,  Jr.,  590. 

Eddy,  Gary  Richard,  389. 

Edelman,  Paul  Henry,  670,  719. 
Edge-notched  cards,  1,  569-570,  578. 
Edighoffer,  Judy  Lynn  Harkness,  645. 
Edmund,  Norman  Wilson,  1. 

EDVAC  computer,  385,  386. 

Efe,  Kemal,  680. 

Effective  power,  676,  see  Growth  ratio. 
Efficiency  of  a digraph,  188. 

Ehresmann,  Charles,  628. 

Eichelberger,  Edward  Baxter,  704. 
Eisenbarth,  Bernhard,  489. 

El-Yaniv,  Ran  (rPP'iw  yi),  403. 

Elcock,  Edward  Wray,  551,  730. 

Elementary  symmetric  functions,  239,  609. 
Eleser,  see  Breaux. 

Elevators,  353-356,  374-375,  377-378. 

Elias,  Peter,  581. 

Elkies,  Noam  David,  9. 

Ellery,  Robert  Lewis  John,  395. 

Emde  Boas,  Peter  van,  152. 

Emden,  Maarten  Herman  van,  128,  633,  638. 
Empirical  data,  94-95,  403,  434-435, 
468-470. 

English  language,  1-2,  9,  421. 

common  words,  435-437,  492-493, 
496-497,  513-515. 
dictionaries,  1-2,  421,  558,  589. 
letter  frequencies,  448-450. 

Entropy,  442-446,  454,  457-458. 
Enumeration  of  binary  trees,  60-61,  295. 
balanced,  467,  479. 
leftist,  157. 

Enumeration  of  permutations,  12,  22-24. 
Enumeration  of  trees,  287. 

Enumeration  sorting,  75-80. 

Eppinger,  Jeffrey  Lee,  434,  435. 

Equal  keys,  194-195,  341,  391,  395,  431,  635. 
approximately,  9,  394-395. 
in  heapsort,  655. 
in  quicksort,  136,  635-636. 
in  radix  exchange,  127-128,  137. 

Equality  of  sets,  207. 

Eratosthenes  of  Cyrene  ( ’Epaxoafievrii; 

6 Kuprivodoi;),  642. 

Erdelyi,  Artur  (=  Arthur),  131. 

Erdos,  Pal  (=  Paul),  66,  155,  658. 

Erdwinn,  Joel  Dyne,  2. 

Erkio,  Hannu  Heikki  Antero,  623. 
Error-correcting  codes,  581. 

Ershov,  Andrei  Petrovich  (EpmoB,  Aimpeii 
rieTpoBHv),  547. 

Espelid,  Terje  Oskar,  259. 

Estivill-Castro,  Vladimir,  389. 

Euler,  Leonhard  (EHjiepi>,  JIeoHap;n>  = 
Oftjiep,  JleoHapn),  8-9,  19-21,  35, 

38-39,  395,  593-594,  726. 
numbers  (secant  numbers),  35,  610-611. 
summation  formula,  64,  129,  626,  702. 
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Eulerian  numbers,  35-40,  45-47,  653. 
table,  37. 

Eusterbrock,  Jutta,  213. 

Eve,  James,  496. 

Even-odd  merge,  244. 

Even  permutations,  19,  196. 

Evolutionary  process,  226,  401. 

Exchange  selection  sort,  106. 

Exchange  sorting  73,  105-138. 
optimum,  196. 

Exclusive  or,  20,  519,  589,  667,  723. 
Exercises,  notes  on,  ix-xi. 

Exponential  function,  ^-generalized,  594. 
Exponential  integral,  105,  137,  735. 
Extended  binary  tree:  Either  a single 

“external”  node,  or  an  “internal”  root 
node  plus  its  left  and  right  extended 
binary  subtrees,  181. 

Extendible  hashing,  549. 

External  nodes,  181,  254. 

External  path  length:  Sum  of  the  level 
numbers  of  all  external  nodes,  192, 
303,  306,  344,  347,  361,  363-367, 

413,  434,  502,  505-506. 
modified,  502—503,  511. 

External  searching,  403-408,  481-491, 

496,  498-500,  541-544,  549,  555, 
562-563,  572-573. 

External  sorting,  4-5,  6—10,  248—379. 

Factorials,  23,  187. 

generalized,  32,  594. 

Factorization  of  permutations,  25-35. 
Fagin,  Ronald,  549. 

Fallacious  reasoning,  45,  60,  424,  553. 
Falling  powers,  638-639,  661,  734,  753. 
False  drops,  571-573,  579,  590. 

Fanout,  232,  241. 

Fast  Fourier  transforms,  237. 

Fawkes,  Guido  (Guy),  339. 

Feature  cards,  569-570,  578. 

Feigenbaum,  Joan,  478. 

Feijen,  Wilhelmus  (=  Wim)  Hendricus 
Johannes,  636. 

Feindler,  Robert,  385. 

Feit,  Walter,  609. 

Feldman,  Jerome  Arthur,  578. 

Feldman,  Paul  Neil,  426. 

Feller,  Willibald  (=  Vilim  = Willy  = 
William),  513. 

Felsner,  Stefan,  658. 

Fenner,  Trevor  Ian,  645. 

Ferguson,  David  Elton,  2,  290-291,  297, 
299,  367,  422,  525,  685. 

Fermat,  Pierre  de,  584. 

Ferragina,  Paolo,  489. 

Feurzeig,  Wallace,  79. 

Fiat,  Amos  (DN’D  OlOy),  708. 

Fibonacci,  Leonardo,  of  Pisa  (=  Leonardo 
filio  Bonacii  Pisano),  424. 


Fibonacci  hashing,  xiv,  517-518. 

Fibonacci  heaps,  152. 

Fibonacci  number  system,  348,  424,  729. 
generalized,  286. 

Fibonacci  numbers,  93,  268,  287,  418,  426, 
518,  623,  687,  746,  750. 
generalized,  270,  286,  287,  see  also 
Cascade  numbers. 

Fibonacci  search,  417. 

Fibonacci  trees,  417,  422-424,  457,  459, 

460,  468,  474,  479,  714. 

Fibonaccian  search,  417-419,  422-424. 

Field,  Daniel  Eugene,  583. 

FIFO,  149,  299,  310,  see  Queues. 

File:  A sequence  of  records,  4,  392. 
self-organizing,  401-403,  405-406, 

478,  521,  646. 

Finding  the  maximum,  141,  209. 
and  minimum,  218. 

Fingers,  718. 

Finite  fields,  549-550. 

Finkel,  Raphael  Ari,  565,  566,  566. 

First-fit  allocation,  480,  721. 

First-in-first-out,  149,  299,  310,  see  Queues. 

Fischer,  Michael  John,  152. 

Fishburn,  John  Scot,  721. 

Fishspear,  152. 

Fixed  points  of  a permutation,  62,  66,  617. 

Flajolet,  Philippe  Patrick  Michel,  134, 

565,  566,  576,  630,  644,  649,  703, 

726,  728,  742. 

Flip  operation,  72. 

Floating  buffers,  323,  324,  340,  369. 

Floating  point  accuracy,  41. 

Flores,  Ivan,  388. 

Floyd,  Robert  W,  145,  156,  215,  217,  218, 
226,  230,  237,  238,  240,  297,  374,  375, 
377,  378,  455,  468,  519,  614,  661,  695. 

Foata,  Dominique  Cyprien,  17,  21,  24,  27, 

33,  39,  43,  599,  618,  732,  733. 

FOCS:  Proceedings  of  the  IEEE  Symposia 
on  Foundations  of  Computer  Science 
(1975-),  formerly  called  the  Symposia 
on  Switching  Circuit  Theory  and 
Logic  Design  (1960-1965),  Symposia 
on  Switching  and  Automata  Theory 
(1966-1974). 

Folding  a path,  112-113,  134. 

Foldout  illustration,  324-325. 

Fomin,  Sergey  Vladimirovich  (Oomhh, 
Cepreii  BjiaflHMHpoBHu) , 671. 

Ford,  Donald  Floyd,  395. 

Ford,  Lester  Randolph,  Jr.,  184,  186. 

Forecasting,  321-324,  340,  341,  369, 

387,  388,  693. 

Forest:  Zero  or  more  trees,  47,  494-496, 

508,  512. 

FORTRAN  language,  2-3,  7,  426,  549. 

Forward-testing-backward-insertion,  204. 

Foster,  Caxton  Croxford,  470,  473,  475,  714. 

Fractal  probability  distribution,  400. 

Fractile  insertion,  660. 
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Frame,  James  Sutherland,  60. 

Frangon,  Jean,  152. 

Frank,  Robert  Morris,  93. 

Franklin,  Fabian,  19,  21,  599. 

Fraser,  Christopher  Warwick,  583. 

Frazer,  William  Donald,  122,  259, 

678,  704,  708. 

Fredkin,  Edward,  492. 

Fredman,  Michael  Lawrence,  152,  181, 

442,  480,  549,  578,  614. 

Free  distributive  lattice,  239. 

Free  groups,  511-512. 

Free  trees,  356,  590. 

Frequency  of  access,  399-408,  435,  532,  538. 
Friedman,  Haya,  718. 

Friedman,  Jerome  Harold,  566. 

Friend,  Edward  Harry,  79,  109,  141,  170, 
255,  324,  337,  338,  347,  388,  650. 

Frieze,  Alan  Michael,  645. 

Fringe  analyses,  715. 

Frobenius,  Ferdinand  Georg,  59. 

Front  and  rear  compression,  512. 
Fussenegger,  Frank,  217. 

Gabow,  Harold  Neil,  152,  217. 

Gaines,  Helen  Fouche,  435. 

Gale,  David,  668. 

Galen,  Claudius  (rakrivog,  KXauSiog),  421. 
Galil,  Zvi  (P'bl  U3),  181. 

Gamma  function  T(z),  131-134,  138, 

510,  611,  636-637,  702. 

Gandz,  Solomon,  23. 

Gardner,  Erie  Stanley,  1. 

Gardner,  Martin,  370,  585,  590,  651,  697. 
Gardy,  Daniele,  703. 

Garsia,  Adriano  Mario,  454,  597,  711. 
Garsia-Wachs  algorithm,  446-452,  458. 
Gasarch,  William  Ian,  213. 

Gassner,  Betty  Jane,  40-41,  262. 

Gaudette,  Charles  H.,  347. 

GauB  (=  Gauss),  Johann  Friderich  Carl 
(=  Carl  Friedrich),  395. 
integers,  21. 

gcd:  Greatest  common  divisor. 

Generable  integer,  103. 

Generating  functions,  techniques  for  using, 
15-17,  19-20,  32-34,  38-42,  45-47, 

68,  102-104,  135,  177,  194,  261-262, 
270,  275-276,  294-299,  340-341, 

425,  455,  479,  490,  503-506,  539, 

553,  619,  678,  695,  703. 

Genes,  72. 

Genetic  algorithms,  226,  229. 

Genoa,  Giovanni  di,  421. 

Geometric  data,  563-566. 

George,  John  Alan,  707. 

Gessel,  Ira  Martin,  597. 

Getu,  Seyoum  (flS?”  “) nv),  607. 

Ghosh,  Sakti  Pada  CUW) , 395, 

487,  578,  579. 


Gibson,  Kim  Dean,  589. 

Gilad-Bachrach,  Ran  ("pra-llPs  p),  403. 
Gilbert,  Edgar  Nelson,  453,  454. 

Gilbreath,  Norman  Laurence,  370. 

principle,  370,  378. 

Gillis,  Joseph  (D’b’l  qov>),  601. 

Gilstad,  Russell  Leif,  268,  301,  336,  721. 
Gini,  Corrado,  401. 

Gleason,  Andrew  Mattei,  193,  648. 

Goetz,  Martin  Alvin,  297,  315,  316, 

338,  368,  388,  680. 

Goldberg,  Andrew  Vladislav  (FojibaGepr, 
Aimpeii  BjiaflHCjiaBOBHu),  152. 

Golden  ratio,  xiv,  138,  517-518,  748-749. 
Goldenberg,  Daniel,  387. 

Goldstein,  Larry  Joel,  673. 

Golin,  Mordecai  Jay  (iPl)  npV’  '0310, 

M BIIt),  649. 

Gonnet  Haas,  Gaston  Henry,  489,  533, 

565,  606,  707,  734. 

Good,  Irving  John,  513. 

Goodman,  Jacob  Eli,  566. 

Goodwin,  David  Thomas,  302. 

Gore,  John  Kinsey,  385. 

Gotlieb,  Calvin  Carl,  388,  442. 

Goto,  Eiichi  (fiURg— ),  534. 

Gourdon,  Xavier  Richard,  134. 

GPX  system,  738. 

Grabner,  Peter  Johannes,  644. 

Graham,  Ronald  Lewis  (JStyflf ).  198,  202- 
203,  205-206,  242,  550,  597,  729,  762. 
Grasselli,  Antonio,  670. 

Grassl,  Richard  Michael,  69. 

Gray,  Harry  Joshua,  Jr.,  578. 

Gray,  James  Nicholas,  390. 

Greatest  common  divisor,  91,  185,  683-684. 
Green,  Milton  Webster,  227,  239,  667, 

668,  673. 

Greene,  Curtis,  70,  670,  718,  744. 

Greene,  Daniel  Hill,  736. 

Greek  mathematics,  420. 

Greniewski,  Marek  Jozef,  513. 

Grid  files,  564,  565. 

Gries,  David  Joseph,  618. 

Grinberg,  Victor  Simkhovich  ( 1 pniioopr. 

Bhktop  Chmxobhh),  671. 

Griswold,  William  Gale,  549. 

Gross,  Oliver  Alfred,  194,  653. 

Grossi,  Roberto,  489. 

Group,  free,  511-512. 

Group  divisible  block  designs,  746. 

Grove,  Edward  Franklin,  371. 

Growth  ratio,  273. 

Guibas,  Leonidas  John  (rxipraxg,  AecovtSag 
Igoocwou),  477,  525,  709,  737. 

Guilbaud,  Georges  Theodule,  593. 

Gunji,  Takao  (fJj5W||i§f^),  534. 

Gustafson,  Richard  Alexander,  573. 
Gustavson,  Frances  Goertzel,  698. 
Gwehenberger,  Gernot,  498. 

Gyrating  sort,  315. 
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h-ordered  sequence,  86,  103-104,  243. 
Hadian,  Abdollah,  186,  212,  217. 

Hajela,  Dhananjay  (SRuPT  ff^TTT),  402. 
Hajnal,  Andras,  747. 

Half-balanced  trees,  477. 

Hall,  Marshall,  Jr.,  511,  578. 

Halperin,  John  Harris,  625. 

Halpern,  Mark  Irwin,  422. 

Hamilton,  Douglas  Alan,  711. 

Han,  Guo-Niu  595,  596,  599,  602. 

Hanan,  Maurice,  729. 

Hannenhalli,  Sridhar  Subrahmanyam 

615. 

Haralambous,  Yannis  (Xap<xXap.nooc, 
’Icoavvrii;),  782. 

Hardy,  Godfrey  Harold,  704. 

Hardy,  Norman,  590. 

Hare,  David  Edwin  George,  606. 

Harmonic  numbers,  633,  750-751. 

generalized,  400,  405. 

Harper,  Lawrence  Hueston,  704. 

Harrison,  Malcolm  Charles,  572,  579,  745. 
Hash  functions,  514-520,  529,  549-550, 
557-558. 

combinatorial,  573-575,  579-580,  582,  746. 
Hash  sequences,  535,  552. 

Hashing,  513—558. 

Havas,  George,  513. 

Hayward,  Ryan  Bruce,  636,  642. 

Heap:  A heap-ordered  array,  144-145,  149, 
156-157,  253,  336,  646,  680,  705. 
t-ary,  644. 

Heap  order,  144-145. 

Heaps,  Harold  Stanley,  578. 

Heapsort,  144-148,  152-158,  336,  381, 

382,  389,  698. 
with  equal  keys,  655. 

Heide,  see  Meyer  auf  der  Heide. 
Height-balanced  trees,  475,  480. 

Height  of  extended  binary  tree,  195, 

459,  463. 

of  random  binary  search  tree,  458. 
of  random  digital  search  tree,  728. 
of  random  (M  + l)-ary  search  tree,  721. 
of  random  Patricia  tree,  728. 
of  random  trie,  512. 

Heilbronn,  Hans  Arnold,  395. 

Heising,  William  Paul,  400,  739. 

Heller,  Robert  Andrew,  512. 

Hellerman,  Herbert,  548. 

Hellerstein,  Joseph  Meir,  390. 

Heilman,  Martin  Edward,  591. 

Hendricks,  Walter  James,  703. 

Hennequin,  Pascal  Daniel  Michel  Henri,  632. 
Hennie,  Frederick  Clair,  351,  356. 

Hermite,  Charles,  polynomial,  62. 

Herrick,  Robert,  408. 

Hibbard,  Thomas  Nathaniel,  20,  93,  196, 

226,  388,  389,  413,  432,  453,  657. 
Hilbert,  David,  395. 


Hildebrandt,  Paul,  128. 

Hillman,  Abraham  P,  69. 

Hindenburg,  Carl  Friedrich,  14. 

Hindu  mathematics,  23. 

Hinrichs,  Klaus  Helmer,  564. 

Hinterberger,  Hans,  564. 

Hinton,  Charles  Howard,  593. 

Hoare,  Charles  Antony  Richard,  114, 
121-122,  136,  381,  389. 

Hobby,  John  Douglas,  782. 

Hoey,  Daniel  J.,  215. 

Holberton,  Frances  Elizabeth  Snyder, 

324,  386,  387. 

Hollerith,  Herman,  383-385. 

Holt  Hopfenberg,  Anatol  Wolf,  738. 

Homer  ("Opripoc),  421. 

Homogeneous  polynomial,  66. 

Hooker,  William  Weston,  42. 

Hooking-up  of  queues,  172,  177. 

Hooks,  59-60,  69-71. 
generalized,  67. 

Hopcroft,  John  Edward,  245,  476, 

477,  590,  652. 

Hoshi,  Mamoru  fjj vF).  727. 

Hosken,  James  Cuthbert,  388,  391. 

Hot  queues,  152. 

Hsu,  Meichun  488. 

Hu,  Te  Chiang  (iSHS®),  454,  711,  713. 
Hu-Tucker  algorithm,  454. 

Huang  Bing-Chao  (lit JUS),  702. 

Hubbard,  George  Underwood,  363,  389. 
Huddleston,  Charles  Scott,  477,  718. 
Huffman,  David  Albert,  trees,  361,  438,  458. 
Human-computer  interaction,  588. 

Hunt,  Douglas  Hamilton,  88. 

Hurwitz,  Henry,  633. 

Hwang,  Frank  Kwangming 
188,  195,  202-206. 

Hwang,  Hsien-Kuei  (Mil'#;),  650. 

Hyafil,  Laurent  Daniel,  218,  377. 

Hybrid  searching  methods,  496. 

Hybrid  sorting  methods,  122,  163,  381,  297. 
Hypercube,  linearized,  408. 

Hypergeometric  functions,  537,  565,  739. 
Hyphenation,  531,  572,  722. 

Hysterical  B-trees,  477. 

IBM  701  computer,  547. 

IBM  705  computer,  82-83. 

IBM  Corporation,  8,  169,  193,  316, 

385,  489,  547. 

IBM  RS/6000  computer,  389. 

Idempotent  laws,  239. 

Identifier:  A symbolic  name  in  an 
algebraic  language,  2. 

Identity  element,  24. 

Implicit  data  structures,  426,  481. 
in  situ  permutation,  79-80,  178. 

In  the  past,  see  Persistent  data  structures. 
Inakibit-Anu  of  Uruk  420. 

Incerpi,  Janet  Marie,  91,  93. 
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Inclusion  and  exclusion  principle,  586, 

703,  744. 

Inclusion  of  sets,  393-394. 

Inclusive  queries,  569-573,  577. 

Incomplete  gamma  function  7(0,2),  555. 
Increasing  forests,  47. 

Independent  random  probing,  548,  555. 
Index  keys  for  file  partitioning,  512. 

Index  of  a permutation,  16-17,  21-22,  32. 
Index  to  this  book,  561-562,  757-782. 
Indexed-sequential  files,  482. 

Indian  mathematics,  23. 

Infinity,  4,  138-139,  142-144,  156,  214, 
257-258,  263,  521,  624-625,  646, 
663-664,  685,  707. 
as  sentinel,  159,  252,  308,  324. 

Information  retrieval,  392,  395. 

Information  theory,  183,  198,  442-444,  633. 
lower  bounds  from,  183,  194,  202, 

204,  655. 

Inner  loop:  Part  of  a program  whose 

instructions  are  performed  much  more 
often  than  the  neighboring  parts,  162, 
see  Loop  optimization. 

Insertion:  adding  a new  item, 
into  a 2-3  tree,  476-477. 
into  a B-tree,  483-485. 
into  a balanced  tree,  461-473,  479. 
into  a binary  search  tree,  427-429, 

458,  482. 

into  a digital  search  tree,  497. 

into  a hash  table,  522,  526,  529,  551. 

into  a heap,  156. 

into  a leftist  tree,  150,  157. 

into  a trie,  507. 

into  a weight-balanced  tree,  480. 

Insertion  sorting,  73,  80-105,  222. 

Interblock  gaps,  318-319,  331. 

Intercalation  product  of  permutations, 

24-35. 

Interchanging  blocks  of  data,  619,  701. 
Internal  (branch)  node,  181,  254,  see 
Extended  binary  tree. 

Internal  path  length,  413,  434,  455,  502,  565. 

generating  function  for,  621. 

Internal  searching,  396-481,  492-512, 
513-541,  545-558. 
summary,  545-547. 

Internal  sorting,  4-5,  73-179. 

summary,  380-383. 

Internet,  iv,  x. 

Interpolation  search,  419-420,  422,  425. 
Interval-exchange  sort,  128. 

Interval  heap,  645. 

Intervals,  578. 

Inverse  in  a group,  511. 

Inverse  modulo  m,  517. 

Inverse  of  a permutation,  13-14,  18,  48, 
53-54,  74,  134,  596,  605. 
for  multisets,  32. 


Inversion  tables  of  a permutation,  11-12, 
17-18,  108,  134,  349,  605,  613,  619. 
Inversions  of  a permutation,  11-22,  32, 

77,  82,  86-90,  97,  100,  103,  108, 

111,  134,  156,  168,  178,  349,  353, 

599,  676,  678,  733. 
with  equality,  699. 

Inversions  of  tree  labelings,  609,  733. 
Inverted  files,  560-563,  567,  569-570,  577. 
Involution  coding,  426. 

Involutions,  18,  48,  54,  62-64,  66,  69. 
Isaac,  Earl  Judson,  99. 

Isaiah,  son  of  Amoz  ('(ION  p irpyvn),  247. 
Isbitz,  Harold,  128. 

Isidorus  of  Seville,  Saint  (San  Isidoro 
de  Sevilla),  421. 

Ismail,  Mourad  El  Houssieny 

((Jj£.  Lajj,  I ^ -.11  jl^ft),  601. 

Isomorphic  invariants,  590. 

Isomorphism  testing,  9. 

Itai,  Alon  (>JVN  pw),  707,  711. 

Iverson,  Kenneth  Eugene,  110,  142,  388, 
396,  422,  423,  454,  704. 

JACM:  Journal  of  the  ACM,  a publication 
of  the  Association  for  Computing 
Machinery  since  1954,  440. 

Jackson,  James  Richard,  407. 

Jacobi,  Carl  Gustav  Jacob,  20-21. 

Jacquet,  Philippe  Pierre,  726. 

JAE  (Jump  if  rA  even),  125-126. 

Jainism,  23. 

Janson,  Carl  Svante,  607,  627,  734,  736. 
JA0  (Jump  if  rA  odd),  125-126. 

Japanese  mathematics,  36. 

Jeffrey,  David  John,  606. 

Jensen,  Johan  Ludvig  William  Valdemar, 
711. 

Jewish  mathematics,  23. 

Jiang  Ling  (JXlll^),  589. 

Johansson,  Kurt  Ove,  611. 

John,  John  Welliaveetil,  213,  663,  666. 
Johnsen,  Robert  Lawrence,  512. 

Johnsen,  Thorstein  Lunde,  297. 

Johnson,  Lyle  Robert,  502,  578. 

Johnson,  Selmer  Martin,  184,  186. 

Johnson,  Stephen  Curtis,  555. 

Johnson,  Theodore  Joseph,  488. 

Joke,  571. 

Jonassen,  Arne  Tormod,  713. 

Josephus,  Flavius,  son  of  Matthias 

(n’nnD  p qw  = "FXotfho?  ’Icootixo? 
MaxOlou),  17,  21. 
problem,  17-18,  592. 

Juille,  Hugues  Rene,  226. 

Jump  operators  of  MIX,  6. 

fc-d  trees,  565-566,  581,  746. 
k- d tries,  576. 

Kaas,  Robert,  152. 

Kabbalah,  23. 
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Kaehler,  Edwin  Bruno,  658. 

Kalai,  Gil  (’yip  i’j),  676. 

Kaman,  Charles  Henry,  531,  532. 

Kant,  Immanuel,  395. 

Kanter,  David  Philip,  677. 

Kaplan,  Aryeh  (liDp  D’In),  23. 

Kaplan,  Haim  (liap  D”n),  615. 

Kaplansky,  Irving,  46. 

Karlin,  Anna  Rochelle,  549. 

Karp,  Richard  Manning,  105,  198,  287, 
302,  306,  308-311,  315,  347,  352, 
353-354,  356,  636,  668,  707. 

Karpinski  (=  Karpinski),  Marek 
Mieczyslaw,  454. 

Katajainen,  Jyrki  Juhani,  649. 

Kaufman,  Marc  Thomas,  483. 

Kautz,  William  Hall,  572,  581,  670. 

Kececioglu,  John  Dmitri,  614. 

Kelly,  Wayne  Anthony,  213. 

Kemp,  Rainer,  287,  645. 

Kerov,  Sergei  Vasilievich  (KepoB,  Cepreii 

I iacHjibenHH ) . 611. 

Keys,  4,  392. 

Keysorting,  74,  335,  373-376,  378. 

Khizder,  Leonid  Abramovich  (Xn3,nep, 
JleoHHfl  A6paMOBnu),  479. 

Kingston,  Jeffrey  H.,  454. 

Kipling,  Joseph  Rudyard,  74. 

Kirchhoff,  Gustav  Robert,  first  law, 

118,  127. 

Kirkman,  Thomas  Penyngton,  577,  580. 
triple  systems,  580-581. 

Kirkpatrick,  David  Galer,  213,  218,  663. 

Kirschenhofer,  Peter,  576,  634,  644,  726. 

Kislitsyn,  Sergei  Sergeevich  (Khcjthumh, 
Cepreii  CepreeBH'i),  197,  209,  210, 
212,  217,  661. 

Klarner,  David  Anthony,  585. 

Klein,  Christian  Felix,  745. 

Klein,  Rolf,  714. 

Kleitman,  Daniel  J (Isaiah  Solomon), 
452,  454,  744. 

Klerer,  Melvin,  297,  388. 

Knockout  tournament,  141-142,  207, 

210,  212. 

Knott,  Gary  Don,  21,  434,  519,  529, 

549,  709,  710. 

Knuth,  Donald  Ervin  (fS^S^),  ii,  iv,  vii, 
8,  58,  152,  226,  297,  385,  389,  395, 
398,  420,  422,  454,  478,  536,  585, 
594,  600,  603,  606,  627,  634,  658, 
670,  696,  702,  713,  722,  734,  736, 
741,  742,  758,  762,  782. 

Koch,  Gary  Grove,  578. 

Koester,  Charles  Edward,  390. 

Kohler,  Peter,  669. 

Kollar,  Lubor,  656,  660. 

Komlos,  Janos,  228,  549,  673,  740. 

Konheim,  Alan  Gustave,  376,  505, 

548,  732,  740. 


Koornwinder,  Tom  Hendrik,  601. 

Korn,  Granino  Arthur,  297,  388. 

Korner,  Janos,  513. 

Korshunov,  Aleksey  Dmitrievich 

(KopmyHOB,  AjieKceii  H,MHTpHeBuu), 
669. 

Kreweras,  Germain,  733. 

Kronecker,  Leopold,  753. 

Kronmal,  Richard  Aaron,  99. 

Kronrod,  Mikhail  Aleksandrovich  (KpoHpoa, 
MnxaHJi  AjieKcaimpoBHu),  168. 

Krutar,  Rudolph  Allen,  551. 

Kruyswijk,  Dirk,  744. 

Kummer,  Ernst  Eduard,  739. 

Kwan,  Lun  Cheung,  657. 

KWIC  index,  439-442,  446,  494. 

La  Poutre,  Johannes  Antonius  (=  Han),  747. 
Labelle,  Gilbert,  565. 

Ladner,  Richard  Emil,  389. 

Laforest,  Louise,  565. 

Lagrange  (=  de  la  Grange),  Joseph  Louis, 
Comte,  inversion  formula,  555. 

Laguerre,  Edmond  Nicolas,  polynomials, 

601. 

LaMarca,  Anthony  George,  389. 

Lambert,  Johann  Heinrich,  644. 
series,  644. 

Lampson,  Butler  Wright,  525. 

Landauer,  Walter  Isfried,  482,  578. 

Lander,  Leon  Joseph,  8. 

Landis,  Evgenii  Mikhailovich  (JlaimHc, 
EBreHHfi  Muxaft jiobuh)  , 459,  460. 
Langston,  Michael  Allen,  702. 

Lapko,  Olga  Georgievna  (Jlamco,  Ojibra 
reoprneBHa),  782. 

Laplace  (=  de  la  Place),  Pierre  Simon, 
Marquis  de,  64. 

LARC  Scientific  Compiler,  2. 

Large  deviations,  636. 

Largest-in-first-out,  see  Priority  queues. 
Larmore,  Lawrence  Louis,  454. 

Larson,  Per-Ake,  549,  741. 

Lascoux,  Alain,  670. 

Last-come-first-served,  742. 

Last-in-first-out,  148,  299,  305. 

Latency  time,  358-359,  376,  489,  562-563. 
Latin  language,  421. 

Lattice,  of  bit  vectors,  235. 

of  permutations,  13,  19,  22,  628. 
of  trees,  718. 

Lattice  paths,  86-87,  102-103,  112-113, 

134,  579. 

Lawler,  Eugene  Leighton,  207. 

Lazarus,  Roger  Ben,  93. 

Least-recently-used  page  replacement, 

158,  488. 

Least-significant-digit-first  radix  sort, 
169-179,  351. 

Leaves,  483,  486,  507. 

Lee,  Der-Tsai  ($^§ftt),  566. 

Lee,  Tsai-hwa  (5$t|ij3jl),  388. 


r 
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Leeuwen,  Jan  van,  645. 

Leeuwen,  Marcus  Aurelius  Augustinus 
van,  611. 

Lefkowitz,  David,  578. 

Left-to-right  (or  right-to-left)  maxima 
or  minima,  12-13,  27,  82,  86,  100, 

105,  156,  624. 

Leftist  trees,  150-152,  157-158. 
deletion  from,  158. 
insertion  into,  150,  157. 
merging,  150,  157. 

Lehmer,  Derrick  Henry,  422. 

Leibholz,  Stephen  Wolfgang,  673. 

Leiserson,  Charles  Eric,  477. 

Levcopoulos,  Christos  (AeoxoitouXoc, 
Xpfioxoi;),  389. 

Level  of  a tree  node:  The  distance 
to  the  root. 

Levenshtein,  Vladimir  Iosifovich 

(JleBemnTeim,  Bna;iuMHp  Hoch4>obhh), 
585. 

LeVeque,  William  Judson,  584. 

Levitt,  Karl  Norman,  670. 

Levy,  Silvio  Vieira  Ferreira,  vii. 
Lexicographic  order,  5,  6,  70,  169,  171,  178, 
394,  420-421,  453,  567,  590,  614,  615. 
lg:  Binary  logarithm,  206,  755. 

Li  Shan-Lan  ($lf(U),  36. 

Liang,  Franklin  Mark,  722,  729. 

Library  card  sorting,  7-8. 

Liddell  Hargreaves,  Alice  Pleasance,  584. 
Liddy,  George  Gordon,  395. 

LIFO,  148,  299,  305,  see  Stacks. 

Lin,  Andrew  Damon,  547,  578. 

Lin,  Shen  (#0),  188,  195,  202-206. 

Lineal  chart,  424. 

Linear  algorithm  for  median,  214-215,  695. 
Linear  algorithms  for  sorting,  5,  102, 
176-179,  196,  616. 

Linear  arrangements,  optimum,  408. 

Linear  congruential  sequence,  383. 

Linear  hashing,  548-549. 

Linear  lists,  248,  385,  459,  see  also 
List  sorting. 

representation  of,  96-97,  163-164, 
471-473,  479-480,  491,  547. 

Linear  order,  4,  181. 

Linear  probing,  526-528,  533-534,  536-539, 
543-544,  547,  548,  551-553,  555,  556. 
optimum,  532. 

Ling,  Huei  578. 

Linial,  Nathan  pi),  660. 

Linked  allocation,  74-75,  96,  99-102,  104, 
164-165,  170-173,  399,  405,  459,  547. 
Linn,  John  Charles,  425. 

Lint,  Jacobus  Hendricus  van,  729,  747. 
Lissajous,  Jules  Antoine,  395. 

List  head,  267,  462. 

List  insertion  sort,  95-98,  104,  380,  382. 

List  merge  sort,  164-168,  183,  381,  382,  390. 


List  sorting,  74-75,  80,  164-168,  171-178, 
382,  390,  698. 

Littlewood,  Dudley  Ernest,  610,  612. 
Littlewood,  John  Edensor,  704. 

Litwin,  Samuel,  578. 

Litwin,  Witold  Andre,  548-549. 

Livius,  Titus,  v. 

Lloyd,  Stuart  Phinney,  395. 

Load  factor,  524,  542. 

Load  point,  319,  320. 

Logan,  Benjamin  Franklin  (=  Tex),  Jr.,  611. 
Logarithmic  search,  410. 

Logarithms,  206. 
discrete,  10. 

Logg,  George  Edward,  714. 

Logical  tape  unit  numbers,  271,  292-293. 
Long  runs,  47. 

Longest  common  prefix,  446,  512. 

Longest  increasing  subsequence,  66,  68,  614. 
Longest  match,  493,  512. 

Loop  optimization,  85,  104-105,  136, 

156,  167,  397-398,  405,  418,  423, 

425,  429,  551,  625. 

Losers,  253-254,  257-258,  263,  267. 
Louchard,  Guy,  707,  713. 

Lozinskii,  Eliezer  Leonid  Solomonovich 
(JIo3HHCKHH,  JleOHHfl  COJIOMOHOBHH, 
TV’bN),  647. 

LSD:  Least  significant  digit,  175. 

Lucas,  Francois  Edouard  Anatole,  611. 

numbers  Ln,  467,  708. 

Luczak,  Tomasz  Jan,  734. 

Lueker,  George  Schick,  742. 

Luhn,  Hans  Peter,  440,  547. 

Lukasiewicz,  Jan,  395,  672. 

Luke,  Richard  C.,  547. 

Lum,  Vincent  Yu-sun  (#j®^),  520,  578. 
Lynch,  William  Charles,  682,  709. 

m-d  tree,  see  A:-d  tree, 
m-d  trie,  see  fc-d  trie. 

Machiavelli,  Niccolo  di  Bernardo,  1. 
MacLaren,  Malcolm  Donald,  176,  178, 

179,  380,  618. 

MacLeod,  Iain  Donald  Graham,  617. 
MacMahon,  Percy  Alexander,  8,  16-17,  20, 
27,  33,  43,  45,  59,  61,  70,  600,  613,  653. 
Master  Theorem,  33-34. 

Macro  language,  457. 

Magic  trick,  370. 

Magnetic  tapes,  6-10,  248-251,  267-357, 
403-407. 

reliability  of,  337. 

Magnus,  Wilhelm,  131. 

Mahmoud,  Hosam  Mahmoud 

( Jj  Jja-h  n ^ . 713,  721. 

Mahon,  Maurice  Harlang  (=  Magenta),  xi. 
Maier,  David,  477. 

Majewski,  Bohdan  Stanislaw,  513. 

Major  index,  see  Index. 

Majorization,  406. 
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Mallach,  Efrem  Gershon,  533. 

Mallows,  Colin  Lingwood,  44,  603,  733. 
Maly,  Kurt,  721. 

Manacher,  Glenn  Keith,  192,  204. 

Maniac  II  computer,  187. 

Mankin,  Efrem  S.,  698. 

Mann,  Henry  Berthold,  623. 

Mannila,  Heikki  Olavi,  389. 

Margoliash,  Daniel  Joseph,  573. 

Markov  (=  Markoff),  Andrei  Andreevich 
(MapKOB,  AHApen  AunpeeHuu),  the 
elder,  process,  340-341. 

Marriage  theorem,  747. 

Marsaglia,  George,  707. 

Martin,  Thomas  Hughes,  487. 

Martinez  Parra,  Conrado,  478,  634. 

Marton,  Katalin,  513. 

Martzloff,  Jean-Claude,  36. 

Mason,  Perry,  1,  2. 

Match,  search  for  closest,  9,  394,  408, 

563,  566,  581. 

Matching,  747. 

Math.  Comp.:  Mathematics  of  Computation 
(I960-),  a publication  of  the  American 
Mathematical  Society  since  1965; 
founded  by  the  National  Research 
Council  of  the  National  Academy 
of  Sciences  under  the  original  title 
Mathematical  Tables  and  Other  Aids 
to  Computation  (1943-1959). 

Mathsort,  79. 

Matrix:  A two-dimensional  array. 

representation  of  permutations,  14,  48. 
searching  in  a,  207. 
transpose  of  a,  6-7,  14,  567,  617. 
Matsunaga,  Yoshisuke  (fKykM'jS'i)-  36. 
Matula,  David  William,  216. 

Mauchly,  John  William,  82,  346,  348, 
386-387,  422. 

Maximum-and-minimum  finding,  218. 
Maximum  finding,  141,  209. 

McAllester,  Robert  Linne,  282. 

McAndrew,  Michael  Harry,  502. 

McCabe,  John,  402-403. 

McCall’s  Cook  Book,  8,  568. 

McCarthy,  John,  8,  128,  167. 

McCracken,  Daniel  Delbert,  388,  422. 
McCreight,  Edward  Meyers,  480,  482, 

483,  487-490,  578,  719. 

McDiarmid,  Colin  John  Hunter,  636, 

642,  643. 

McGeoch,  Catherine  Cole,  403. 

Mcllroy,  Malcolm  Douglas,  122,  177, 

635,  652,  738. 

Mcllroy,  Peter  Martin,  177,  652. 

McKellar,  Archie  Charles,  122,  378,  708. 
McKenna,  James,  266. 

McNamee,  Carole  Mattern,  623. 

McNutt,  Bruce,  563-564. 

Measures  of  disorder,  11,  22,  72,  134,  389. 


Median,  122,  136,  214-215,  217-218, 

238,  566,  695,  701. 
linear  algorithm  for,  214-215,  695. 
Median-of-three  quickfind,  634. 
Median-of-three  quicksort,  122,  136, 

138,  381,  382. 

Mehlhorn,  Kurt,  442,  454,  477,  489, 

549,  713,  715,  718. 

Meister,  Bernd,  740. 

Mellin,  Robert  Hjalmar,  transforms, 
133-134,  506,  644,  649. 

Mendelson,  Haim  (poblfQ  0”n),  728. 

Merge  exchange  sort,  110-113,  134-135, 
223,  226,  381,  382,  389. 

Merge  insertion  sort,  184-187,  192, 

193,  381,  389. 

Merge  numbers,  274. 

Merge  patterns,  see  Balanced  merge, 
Cascade  merge,  Oscillating  sort, 
Polyphase  merge, 
dual  to  distribution  patterns, 

345-348,  359. 

for  disks,  362-365,  376-377. 
for  tapes,  248-251,  267-317. 
optimum,  302-311,  363-367,  376-377. 
summary,  324-338. 

tree  representation  of,  303-306,  309-311, 
363-364,  377. 

vector  representation  of,  302-303, 

309,  310. 

Merge  replacement  sort,  680. 

Merge  sorting,  98,  158-168;  see  List  merge, 
Natural  merge,  Straight  merge, 
external,  see  Merge  patterns. 
Merge-until-empty  strategy,  287. 

Merging,  385,  390,  480,  698. 

k- way,  166,  252-254,  321-324,  339-343, 
360-373,  379. 

networks  for,  223-226,  230-232,  237,  239. 
with  fewest  comparisons,  197-207. 
Mersenne,  Marin,  23-24. 

METAFONT,  iv,  vi,  782. 

METAPOST,  vii,  782. 

Meyer,  Curt,  20. 

Meyer,  Werner,  606. 

Meyer  auf  der  Heide,  Friedhelm,  549. 
Middle  square,  515,  516. 

Middle  third,  240,  241. 

Miles,  Ernest  Percy,  Jr.,  285. 

Miltersen,  Peter  Bro,  230. 

Minimax,  192,  195. 

Minimean,  192,  195-196. 

Minimum  average  cost,  192-197,  207, 
215-216,  413,  663. 

Minimum-comparison  algorithms,  180-219. 
for  merging,  197-207. 
for  searching,  413-414. 
for  selection,  207-219. 
for  sorting,  180-197. 

Minimum  path  length,  192,  195,  361. 

weighted,  196,  337,  361,  438,  451,  458. 
Minimum-phase  tape  sorting,  311. 
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Minimum-space  algorithms, 
for  merging,  168,  390,  702. 
for  rearranging,  178,  617. 
for  selection,  218. 
for  sorting,  79-80,  389. 
for  stable  sorting,  390. 

Minimum-stage  tape  sorting,  311. 
Minimum-time  algorithms, 
for  merging,  241. 
for  sorting,  235,  241. 

Minker,  Jack,  578. 

MinuteSort,  390. 

Mises,  Richard,  Edler  von,  513. 

Misspelled  names,  394-395. 

Mitchell,  Oscar  Howard,  69,  611. 

MIX  computer:  A hypothetical  machine 
defined  in  Section  1.3,  vi,  75,  382. 

MIXAL:  The  MIX  assembly  language,  426. 

MIXT  tape  units,  318-320,  330-331,  358. 
MIXTEC  disks  and  drums,  357-358,  562-563. 
Miyakawa,  Masahiro  (in  1 1 IE?/,),  727. 
Mobius,  August  Ferdinand, 
function  33. 

Modified  external  path  length,  502-503,  511. 
Moffat,  Alistair,  389. 

Molodowitch,  Mariko,  742. 

Monomial  symmetric  function,  609. 
Monotone  Boolean  functions,  239. 

Monotonic  subsequences,  66,  68,  614. 
Monotonicity  property,  439. 

Monting,  see  Schulte  Monting. 

Mooers,  Calvin  Northrup,  571. 

Moore,  Edward  Forrest,  255,  263,  453-454. 
Moore  School  of  Electrical  Engineering,  386. 
Morgenthaler,  John  David,  713. 

Morris,  Robert,  548,  738,  741. 

Morrison,  Donald  Ross,  498. 

Morse,  Samuel  Finley  Breese,  code,  623. 
Mortenson,  John  Albert,  684. 

Moser,  Leo,  64. 

Most-significant-digit-first  radix  sort, 
175-179. 

Motzkin,  Theodor  (=  Theodore)  Samuel 
(VP8in  Pnidw  TmN’Ji),  704. 
Move-to-front  heuristic,  402-403, 

405-406,  646. 

MSD:  Most  significant  digit,  175. 

Muir,  Thomas,  11. 

Mullin,  James  Kevin,  573,  583. 
Multi-attribute  retrieval,  395,  see  Secondary 
key  retrieval. 

Multidimensional  binary  search  trees, 
see  fc-d  trees. 

Multidimensional  tries,  see  fc-d  tries. 
Multihead  bubble  sort,  244-245. 

Multikey  quicksort,  389,  633,  728. 

Multilist  system,  561,  562,  578. 

Multinomial  coefficients,  23,  30,  32,  457,  735. 
Multiple  list  insertion  sort,  99-102,  104-105, 
196,  380,  382,  520. 


Multiple-precision  constants,  155,  566,  644, 
648,  650,  713,  715,  726,  748-750. 
Multiples  of  an  irrational  number  mod  1, 
xiv,  517-518,  550. 

Multiprecision  comparison,  6,  136,  169. 
Multiprocessing,  267,  390. 

Multireel  files,  337,  342,  348,  356. 

Multiset:  Analogous  to  a set,  but  elements 
may  appear  more  than  once,  22,  158, 
211,  217,  241,  298,  648,  670,  744. 
ordering,  311. 

permutations,  22-35,  42-45,  66. 
sum  and  union,  597. 

Multivalued  logic,  672. 

Multiway  trees,  453,  481-491,  707, 
see  also  Tries. 

Multiword  keys,  136,  168. 

Munro,  James  Ian,  218,  435,  478,  533,  583, 
655,  708,  734,  741,  742. 

Muntz,  Richard,  482,  490. 

Muroga,  Saburo  (SESlElfl),  729. 

Music,  23-24. 

Musser,  David  Rea,  122. 

Myers,  Eugene  Wimberly,  Jr.,  583. 

Nagler,  Harry,  82,  347,  646,  648. 

Nakayama,  Tadasi  (tfcllllE),  69,  612. 

Naor,  Simeon  (=  Moni; 

uni  ’ad,  yiynw),  708. 

Narasimhan,  Balasubramanian 

(urreo&urjwenflujasr  jsrjS\WLoasr),  707. 
Narayana  Pandita,  son  of  Nrsimha 

(ffTTTW  Sfu^d,  1JT:),  270. 

Natural  correspondence  between  forests 
and  binary  trees,  706. 

Natural  merge  sort,  160-162,  167. 

Natural  selection,  259-261,  263-266. 

Nearest  neighbors,  563,  566. 

Needle,  1,  569,  572,  573-574. 

Negative  links,  164,  175. 

Neighbors  of  a point,  563,  566. 

Neimat,  Marie-Anne  Kamal 
t£jb>),  549. 

Nelson,  Raymond  John,  225,  226,  244,  245. 
Netto,  Otto  Erwin  Johannes  Eugen, 

286,  592. 

Networks  of  comparators, 

for  merging,  223-226,  230-232,  237,  239. 
for  permutations,  243-244. 
for  selection,  232-234,  238. 
for  sorting,  219-247. 
primitive,  240,  668. 
standard,  234,  237-238,  240,  244. 
with  minimum  delay,  228-229,  241. 
Networks  of  workstations,  267,  390. 
Neumann,  John  von  (=  Margittai  Neumann 
Janos),  8,  159,  385. 

Newcomb,  Simon,  42,  45. 

Newell,  Allen,  729. 

Newman,  Donald  Joseph,  505. 

Nielsen,  Jakob,  511-512. 
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Nievergelt,  Jiirg,  476,  480,  549,  564. 
Nijenhuis,  Albert,  70. 

Nikitin,  Andrei  Ivanovich  (Hhkhthh, 
Amapeii  IlBaHOBHM),  351. 

Nitty-gritty,  317-343,  357-379. 

Nodine,  Mark  Howard,  698. 

Non-messing-up  theorem,  90,  238,  668-669. 
Nondeterministic  adversary,  219. 

Norlund,  Niels  Erik,  638. 

Normal  deviate,  69. 

Normal  distribution,  approximately,  45,  650. 
Norwegian  language,  520. 

Noshita,  Kohei  (SjFT'/n^p),  213,  218. 
Notations,  index  to,  752-756. 

Novelli,  Jean-Christophe,  614. 

NOW-Sort,  390. 

NP-complete  problems,  242. 

Nsort,  390. 

Null  permutation,  25,  36. 

Number-crunching  computers,  175, 

381,  389-390. 

Numerical  instability,  41. 

Nyberg,  Christopher,  390. 

Oberhettinger,  Fritz,  131. 

Oblivious  algorithms,  219-220. 

O’Connor,  Daniel  Gerard,  225. 

Octrees,  565. 

Odd-even  merge,  223-226,  228,  230, 

243,  244. 

Odd-even  transposition  sort,  240. 

Odd  permutations,  19,  196. 

Odell,  Margaret  K.,  394. 

Odlyzko,  Andrew  Michael,  630,  715. 
Oettinger,  Anthony  Gervin,  491. 

Okoma,  Seiichi  ),  644. 

Oldham,  Jeffrey  David,  vii. 

Olivie,  Hendrik  Johan,  477. 

Olson,  Charles  A.,  544. 

Omega  network,  227,  236-237. 

One-sided  height-balanced  trees,  480. 
One-tape  sorting,  353-356. 

O’Neil,  Patrick  Eugene,  489. 

Ones’  complement  notation,  177. 

Online  merge  sorting,  167. 

Open  addressing,  525-541,  543-544, 

548,  551-557. 

optimum,  539-541,  555-556. 

Operating  systems,  149,  158,  338. 
Optimization  of  loops,  85,  104-105,  136, 

156,  167,  397-398,  405,  418,  423, 

425,  429,  551,  625. 

Optimization  of  tests,  406. 

Optimum  binary  search  trees,  436-454, 
456-458,  478. 

Optimum  digital  search  trees,  511. 

Optimum  exchange  sorting,  196. 

Optimum  linear  arrangements,  408. 
Optimum  linear  probing,  532. 

Optimum  linked  trie,  508. 


Optimum  merge  patterns,  302-311, 

363-367,  376-377. 

Optimum  open  addressing,  539-541, 
555-556. 

Optimum  permutations,  403-408. 

Optimum  polyphase  merge,  274-279, 

286,  337. 

Optimum  searching,  413,  425,  549. 

Optimum  sorting,  180-247. 

OR  (bitwise  or),  529,  571. 

Order  ideals,  669. 

Order  relations,  4. 

Order  statistics,  6,  44. 

Ordered  hashing,  531,  741. 

Ordered  partitions,  286-287. 

Ordered  table,  searching  an,  398-399, 
409-426. 

Ordering  of  permutations,  19,  22,  105,  670. 
Organ-pipe  order,  407,  452,  704. 

Oriented  trees,  47,  71,  599. 

Orosz,  Gabor,  745. 

O’Rourke,  Joseph,  566. 

Orthogonal  range  queries,  564,  566. 
Oscillating  radix  sort,  347. 

Oscillating  sort,  311-317,  328-329,  334, 

337,  338,  342,  389. 

Overflow,  arithmetic,  6,  519,  585. 
in  B-trees,  487-490. 
in  hash  tables,  522,  525,  526,  529, 

542-543,  547,  551. 

Overmars,  Markus  (=  Mark)  Hendrik,  583. 
Own  coding,  339. 

P- way  merging,  166,  252-254,  321-324, 
339-343,  360-373,  379. 

Packing,  721. 

Page,  Ralph  Eugene,  385. 

Paging,  158,  378,  481-482,  541,  547. 

Pagodas,  152. 

Paige,  Robert  Allan,  652. 

Painter,  James  Allan,  256. 

Pairing  heaps,  152. 

Pak,  Igor  Markovich  (nax,  Hroph 
MapKOBHu),  70,  614. 

Palermo,  Frank  Pantaleone,  729. 

Pallo,  Jean  Marcel,  718. 

Panny,  Wolfgang  Christian,  629,  630,  648. 
Papernov,  Abram  Alexandrovich  (FlanepHOB, 
A6paM  A/ieKcaHnpoBHu),  91. 

Pappus  of  Alexandria  (na7tnoc 
6 ’Ake$av5piv6{),  593. 

Parallel  processing,  267,  370,  390,  693. 
merging,  231,  241. 
searching,  425. 

sorting,  113,  222-223,  228-229, 

235,  390,  671. 

Parberry,  Ian,  666. 

Pardo,  see  Trabb  Pardo. 

Pareto,  Vilfredo,  401. 

Pareto  distribution,  401,  405,  710. 

Parker,  Ernest  Tilden,  8. 

Parkin,  Thomas  Randall,  8. 
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Parking  problem,  552,  553,  742. 

Parsimonious  algorithms,  391. 

Partial  match  retrieval,  559-582. 

Partial  ordering,  31-32,  68-69,  183-184, 

187,  216,  217. 

of  permutations,  19,  22,  105,  670. 
Partition-exchange  sort,  114. 

Partitioning  a file,  113-115,  123-124, 

128,  136. 

into  three  blocks,  137. 

Partitions  of  a set,  605,  653. 

Partitions  of  an  integer,  19-20,  504, 

613,  621,  697,  700. 
ordered,  286-287. 
plane,  69-70. 

Pasanen,  Tomi  Armas,  649. 

Pass:  Part  of  the  execution  of  an  algorithm, 
traversing  all  the  data  once,  5,  268,  272. 
Patashnik,  Oren,  762. 

Patents,  vi,  225,  231,  244,  255,  312,  315-316, 
369,  384-385,  394,  675,  729. 

Paterson,  Michael  Stewart,  152,  215,  230, 
594,  655,  689,  736. 

Path  length  of  a tree,  see  External  path 
length,  Internal  path  length, 
minimum,  192,  195,  361. 
weighted,  196,  337,  361,  438,  451,  458. 
weighted  by  degrees,  363-367. 

Paths  on  a grid,  86-87,  102-103,  112-113, 
134,  579. 

Patricia,  489,  498-500,  505-506,  508, 
510-511,  576,  726. 

Patt,  Yale  Nance,  508. 

Pattern  matching  in  text,  511,  572,  578. 
Patterns  in  permutations,  61. 

Patterson,  David  Andrew,  390. 

Patterson,  George  William,  386,  422. 

Peaks  of  a permutation,  47,  604. 

Peczarski,  Marcin  Piotr,  192. 

Pentagonal  numbers,  15,  19. 

Percentiles,  136,  207-219,  472,  see 
also  Median. 

Perfect  balancing,  480. 

Perfect  distributions,  268-272,  276-277, 

286,  288-289. 

Perfect  hash  functions,  513,  549. 

Perfect  shuffles,  237. 

Perfect  sorters,  245. 

Periodic  sorting  networks,  243. 

Perl,  Yehoshua  (!?13  ywirP),  673,  707. 
Permanent,  660. 

Permutahedron,  13,  18,  240,  593. 
Permutation  in  place,  79-80,  178. 
Permutation  networks,  243-244. 
Permutations,  11-72,  579. 

2-ordered,  86-88,  103,  112-113,  134. 
cycles  of,  25-32,  62,  156,  617,  628, 
639-640,  657. 
enumeration  of,  12,  22-24. 
even,  19,  196. 


factorization  of,  25-35. 
fixed  points  of,  62,  66,  617. 
indexes  of,  16-17,  21-22,  32. 
intercalation  product  of,  24-35. 
inverses  of,  13-14,  18,  48,  53-54,  74, 
134,  596,  605. 

inversions  of,  see  Inversion  tables, 
Inversions. 

lattice  of,  13,  19,  22,  628. 

matrix  representations  of,  14,  48. 

of  a multiset,  22-35,  42-45,  66. 

optimum,  403-408. 

partial  orderings  of,  19,  22,  105,  670. 

pessimum,  405. 

readings  of,  46-47. 

runs  of,  35-47,  248,  259-266,  387. 

signed,  615. 

two-line  notation  for,  13-14,  24,  35, 
43-44,  51-54,  64-65. 
up-down,  68. 

Persistent  data  structures,  583. 
Perturbation  trick,  404. 

Pessimum  binary  search  trees,  457,  711. 
Peter,  Laurence  Johnston,  principle,  143. 
Peterson,  William  Wesley,  396,  422, 

526,  534,  538,  548. 

Petersson,  Ola,  389. 

Pevzner,  Pavel  Arkadjevich  (IIeB3Hep, 
IlaBeji  ApKaAbeBHu),  615. 

Peyster,  James  Abercrombie  de,  Jr.,  544. 
Phi  (</>),  xiv,  138,  517-518,  748-749. 
Philco  2000  computer,  256. 

Pi  (tt),  372,  520,  748-749. 

as  “random”  example,  17,  370,  385, 

547,  552,  733. 

Picard,  Claude  Frangois,  183,  196,  215. 
Ping-pong  tournament,  142. 

Pinzka,  Charles  Frederick,  728. 

Pipeline  computers,  175,  381,  389-390. 
Pippenger,  Nicholas  John,  215,  234,  549. 
Pitfalls,  41,  707,  729. 

Pittel,  Boris  Gershon  (IlHTTejib,  Bopnc 
PepmoHOBM’i),  713,  721,  728,  734. 
PL/I  language,  339,  532. 

Plane  partitions,  69-70. 

Plankalkiil,  386. 

Plaxton,  Charles  Gregory,  623,  667. 
Playing  cards,  42-45,  169,  178,  370. 
Pliicker,  Julius,  745. 

Poblete  Olivares,  Patricio  Vicente,  646, 
740,  741,  742. 

Pocket  sorting,  343. 

Podderjugin,  Viktor  Denisowitsch 

(riojmeprarMH,  Bhktop  /J^hhcobhu), 

548. 

Pohl,  Ira  Sheldon,  218,  663. 

Pohlig,  Stephen  Carl,  591. 

Point  quadtrees,  566. 

Poisson,  Simeon  Denis,  distribution,  555. 

transform,  734. 

Polish  prefix  notation,  3. 

Pollard,  John  Michael,  591,  669,  672. 
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Polya,  Gyorgy  (=  George),  599,  704. 
Polygons,  regular,  289. 

Polynomial  arithmetic,  165,  520. 
Polynomial  hashing,  520,  550. 

Polyphase  merge  sorting,  268-287,  297, 
298,  300,  311,  325-326,  333,  342, 

346,  389,  425. 

Caron  variation,  279-280,  286-287. 
optimum,  274-279,  286,  337. 
read-backward,  300-302,  308,  328, 

334,  338,  342. 

tape-splitting,  282-285,  287,  298, 
326-327,  333,  338. 

Polyphase  radix  sorting,  348. 

Pool,  Jan  Albertus  van  der,  739. 

Pool  of  memory,  369. 

Poonen,  Bjorn,  104. 

Porter,  Thomas  K,  642. 

Post  office,  175. 

Post-office  trees,  563-564,  746. 

Posting,  see  Insertion. 

Pouring  liquid,  672. 

Power  of  merge,  676,  see  Growth  ratio. 
Powers,  James,  385. 

Pratt,  Richard  Don,  310. 

Pratt,  Vaughan  Ronald,  91,  104,  245, 

457,  622,  675,  701. 

sorting  method,  91-93,  104,  113,  235. 
Prediction,  see  Forecasting. 

Preferential  arrangements,  see  Weak 
orderings. 

Prefetching,  369-373. 

Prefix,  492. 

Prefix  code,  452-453. 

for  all  nonnegative  integers,  6. 

Prefix  search,  see  Trie  search. 

Preorder  merge,  307—309. 

Prestet,  Jean,  24. 

Prime  numbers,  156,  516,  529,  557,  627. 
Primitive  comparator  networks,  240,  668. 
Principle  of  optimality,  363,  438. 

Pring,  Edward  John,  564. 

Prins,  Jan  Fokko,  618. 

Priority  deques,  157. 

Priority  queues,  148-152,  156-158, 

253,  646,  705. 
merging,  150,  157. 

Priority  search  trees,  578. 

Probability  density  functions,  177. 
Probability  distributions,  105,  399-401. 
beta,  586. 

binomial,  100-101,  341,  539,  555. 
fractal,  400. 
normal,  45,  69,  650. 

Pareto,  401,  405,  710. 

Poisson,  555. 
random,  458. 

uniform,  6,  16,  20,  47,  127,  606. 

Yule,  401,  405. 

Zipf,  400,  402,  435,  455. 


Probability  generating  functions,  15-16, 
102,  104,  135,  177,  425,  490,  539, 

553,  555,  739. 

Prodinger,  Helmut,  576,  634,  644,  648,  726. 
Product  of  consecutive  binomial 
coefficients,  612. 

Proof  of  algorithms,  49-51,  112-113, 

315,  323,  355,  677. 

Prusker,  Francis,  377. 

Prywes,  Noah  Shmarya,  578. 

Pseudolines,  670. 

Psi  function  637,  751. 

Puech,  Claude  Henri  Clair  Marie  Jules, 

565,  566,  576. 

Pugh,  William  Worthington,  Jr.,  213,  478. 
Punched  cards,  169-170,  175,  383-385. 
Pyke,  Ronald,  732. 

q-multinomial  coefficients,  32. 
ij-nomial  coefficients,  32,  594,  595. 
q-series,  20,  32,  594-596,  644. 

Quadrangle  inequality,  457. 

Quadratic  probing,  551. 

Quadratic  selection,  141. 

Quadruple  systems,  581,  746. 

Quadtrees,  565-566,  581,  746. 

Queries,  559-582. 

Questionnaires,  183. 

Queues,  135,  148-149,  156,  171,  299, 

310,  322-323. 

Quickfind,  136. 

median-of-three,  634. 

Quicksort,  113-122,  135-138,  148,  159, 

246,  349-351,  356,  381,  382,  389, 

389,  431,  698. 

binary,  see  Radix  exchange, 
median-of-three,  122,  136,  138,  381,  382. 
multikey,  389,  633,  728. 
with  equal  keys,  136,  635-636. 

Rabbits,  424. 

Rabin,  Michael  Oser  (pm  im>  bND’O),  242. 
Radix-2  sorting,  387. 

Radix  exchange  sort,  122-128,  130-133, 
136-138,  159,  177,  351,  382,  389, 
500-501,  509,  698. 
with  equal  keys,  127-128,  137. 

Radix  insertion  sort,  176-177. 

Radix  list  sort,  171-175,  382. 

Radix  sorting,  5,  169-179,  180-181, 
343-348,  351,  359,  374,  381,  385, 

389,  421,  502,  698. 
dual  to  merge  sorting,  345-348,  359. 
Radke,  Charles  Edwin,  297. 

Raiha,  Kari-Jouko,  717. 

Railway  switching,  168. 

Rais,  Bonita  Marie,  726. 

Raman  Rajeev,  634. 

Raman,  Venkatesh  (QeiirsiaGisip 
rrrTLDisir),  655. 
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Ramanan,  Prakash  Viriyur  (iSIrrarrsifi 
sSliflg^rf  rnLsmsisr),  218. 

Ramanujan  Iyengar,  Srinivasa 

(ur^sisflsurTSW  [j mo rrs$)i  som  goiursi&mt), 
function  Q(n),  701. 

Ramshaw,  Lyle  Harold,  729. 

Random  data  for  sorting,  20,  47,  76, 

383,  391. 

Random  probability  distribution,  458. 

Random  probing,  independent,  548,  555. 
with  secondary  clustering,  548,  554. 

Randomized  adversary:  An  adversary 
that  flips  coins,  219. 

Randomized  algorithms,  121-122,  351, 

455,  517,  519,  557-558. 

Randomized  binary  search  trees,  478. 

Randomized  data  structures,  vii,  478. 

Randomized  striping,  371-373,  379,  698. 

Randow,  Rabe-Riidiger  von,  606. 

Randrianarimanana,  Bruno,  713. 

Raney,  George  Neal,  297,  298. 

Range  queries,  559,  578. 

RANK  field,  471,  476,  479,  713,  718. 

Ranking,  181,  see  Sorting. 

Raver,  Norman,  729. 

Ravikumar,  Balasubramanian 

(umso&ucnLmsfUuasr  rreSl^wrrrf),  673. 

Rawlings,  Don  Paul,  595. 

Ray-Chaudhuri,  Dwijendra  Kumar 
jWW  578. 

Read-back  check,  360. 

Read-backward  balanced  merge, 

327-328,  334. 

Read-backward  cascade  merge,  328,  334. 

Read-backward  polyphase  merge,  300-302, 
308,  328,  334,  338,  342. 

Read-backward  radix  sort,  346-347. 

Read-forward  oscillating  sort,  315-316, 

329,  334. 

Reading  tape  backwards,  299-317, 

349,  356,  387. 

Readings  of  a permutation,  46-47. 

Real-time  applications,  547. 

Rearrangements  of  a word,  see  Permutations 
of  a multiset. 

Rearranging  records  in  place,  80,  178. 

Rebalancing  a tree,  461,  463-464,  473-474, 
479;  see  also  Reorganizing. 

Reciprocals,  420. 

Records,  4,  392. 

Recurrence  relations,  techniques  for  solving, 
120,  135,  137,  168,  185-186,  205-206, 
224-225,  283,  356,  424,  430-431, 

467,  490,  502,  506,  604-605,  638-639, 
648,  674,  688,  725,  737. 

Recurrence  relations  for  strings,  274-275, 
284,  287,  308. 

Recursion  induction,  315. 

Recursion  versus  iteration,  168,  313,  717. 

Recursive  methods,  114,  214,  218,  243,  313, 

350,  452,  592,  596,  713,  715,  717. 


Red-black  trees,  477. 

Redundant  comparisons,  182,  240,  242, 
245-246,  391. 

Reed,  Bruce  Alan,  643,  713. 

Reference  counts,  534. 

Reflection  networks,  670. 

Regnier,  Mireille,  565,  632. 

Regular  polygons,  289. 

Reiner,  Victor  Schorr,  719. 

Reingold,  Edward  Martin 

D”n  P nwa  prw),  207,  476,  480,  715. 
Relaxed  heaps,  152. 

Remington  Rand  Corporation,  385,  387. 
Removal,  see  Deletion. 

Reorganizing  a binary  tree,  458,  480. 
Replacement  selection,  212,  253-266, 

325,  329,  331-332,  336,  347,  348, 

360,  364-365,  378. 

Replicated  blocks,  489. 

Replicated  instructions,  398,  418,  429, 

625,  648,  677. 

Reservoir,  259-261,  265. 

Restructuring,  480. 

Reversal  of  data,  65,  72,  310,  670,  701. 
Reverse  lexicographic  order,  394. 

Rewinding  tape,  279-287,  297,  299-300,  316, 
319-320,  326,  331,  342,  407. 

Ribenboim,  Paulo,  584. 

Rice,  Stephan  Oswald,  138. 

Richards,  Ronald  Clifford,  479. 

Richmond,  Lawrence  Bruce,  726. 

Riemann,  Georg  Friedrich  Bernhard, 
integration,  177,  652. 

Riesel,  Hans  Ivar,  406. 

Right-threaded  trees,  267,  454-455,  464. 
Right-to-left  (or  left-to-right)  maxima 
or  minima,  12-13,  27,  82,  86,  100, 

105,  156,  624. 

Riordan,  John,  39,  46,  679,  732,  733,  738. 
RISC  computers,  175,  381,  389-390. 

Rising,  Hawley,  128. 

Rivest,  Ronald  Linn,  214,  215,  389,  403, 

477,  573-576,  580,  747. 

Roberts,  Charles  Sheldon,  573. 

Robin  Hood  hashing,  741-742. 

Robinson,  Gilbert  de  Beauregard,  58,  60. 
Robson,  John  Michael,  565,  713. 

Rochester,  Nathaniel,  547. 

Rodgers,  William  Calhoun,  704. 

Rodrigues,  Benjamin  Olinde,  15,  592. 
Roebuck,  Alvah  Curtis,  757. 

Rogers,  Lawrence  Douglas,  707. 

Rohnert,  Hans,  549. 

Rollett,  Arthur  Percy,  593. 

Rooks,  46,  69. 

Rose,  Alan,  672. 

Roselle,  David  Paul,  47,  597. 

Rosenstiehl,  Pierre,  593. 

Rosier,  Uwe,  632. 

Rosser,  John  Barkley,  672. 
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Rost,  Hermann,  614,  671. 

Rotations  in  a binary  tree,  481. 
double,  461,  464,  477. 
single,  461,  464,  477. 

Rotem,  Doron  (DJin  inn),  61. 

Rothe,  Heinrich  August,  14,  48,  62,  592. 
Rouche,  Eugene,  theorem,  681. 

Roura  Ferret,  Salvador,  478. 

Roving  pointer,  543. 

Rovner,  Paul  David,  578. 

Royalties,  use  of,  407. 

Rubin,  Herman,  728. 

Rudolph,  Lawrence  Set,  673. 

Runs  of  a permutation,  35-47,  248, 
259-266,  387. 

Russell,  Robert  Clifford,  394. 

Russian  roulette,  21. 

Rustin,  Randall  Dennis,  315,  353. 

Rytter,  Wojciech,  454. 

Sable,  Jerome  David,  578. 

Sackman,  Bertram  Stanley,  279,  684. 

Sagan,  Bruce  Eli,  48. 

Sager,  Thomas  Joshua,  513. 

Sagiv,  Yehoshua  Chaim 
(3>5\y  0”n  ywr),  721. 

Sagot,  Marie-France,  615. 

Saks,  Michael  Ezra,  452,  660,  673. 

Salveter,  Sharon  Caroline,  477. 

Salvy,  Bruno,  565. 

Samadi,  Behrokh  721. 

Samet,  Hanan  (vnv  pn),  566. 

Samplesort,  122,  720. 

Sampling,  587. 

Samuel,  son  of  Elkanah 
(mppN  p PNIOW),  481. 

Samuel,  Arthur  Lee,  547. 

Sandelius,  David  Martin,  656. 

Sankoff,  David  Lawrence,  614. 

Sapozhenko,  Alexander  Antonovich 

(Canoxcemco,  A-JieKcaHflp  Ahtohobhh), 
669. 

Sarnak,  Neil  Ivor,  583. 

Sasson,  Azra  (llOT  369. 

Satellite  information:  Record  minus 
key,  4,  74. 

Satisfiability,  242. 

Saul,  son  of  Kish  (W’p  p 481. 

Sawtooth  order,  452. 

Sawyer,  Thomas,  747. 

SB-tree,  489. 

SB-tree,  489. 

Scatter  storage,  514. 

Schachinger,  Werner,  576. 

Schaffer,  Alejandro  Alberto,  708. 

Schaffer,  Russel  Warren,  155,  157,  645. 
Schay,  Geza,  Jr.,  538,  555,  729. 

Schensted,  Craige  Eugene  (=  Ea  Ea), 
57-58,  66. 

Scherk,  Heinrich  Ferdinand,  644. 


Schkolnick,  Mario,  721. 

Schlegel,  Stanislaus  Ferdinand  Victor,  270. 

Schlumberger,  Maurice  Lorrain,  366. 

Schmidt,  Jeanette  Pruzan,  708,  742. 

Schneider,  Donovan  Alfred,  549. 

Schonhage,  Arnold,  215,  218. 

Schott,  Rene  Pierre,  713. 

Schreier,  Jozef,  209. 

Schulte  Monting,  Jurgen,  192,  659. 

Schur,  Issai,  function,  611-612. 

Schiitzenberger,  Marcel  Paul,  17,  21,  39, 

55,  57-58,  66,  68,  70,  670. 

Schwartz,  Eugene  Sidney,  401. 

Schwartz,  Jules  Isaac,  128. 

Schwiebert,  Loren  James,  II,  229. 

Scoville,  Richard  Arthur,  47. 

Scrambling  function,  517,  590,  709. 

Search-and-insertion  algorithm,  392. 

Searching,  392-583;  see  External  searching, 
Internal  searching;  Static  table 
searching,  Symbol  table  algorithms, 
by  comparison  of  keys,  398-399, 

409-491,  546-547. 
by  digits  of  keys,  492-512. 
by  key  transformation,  513-558. 
for  closest  match,  9,  394,  408,  563, 

566,  581. 

for  partial  match,  559-582. 
geometric  data,  563-566. 
history,  395-396,  420-422,  453, 

547-549,  578. 

methods,  see  B- trees,  Balanced  trees, 
Binary  search,  Chaining,  Fibonaccian 
search,  Interpolation  search,  Open 
addressing,  Patricia,  Sequential  search, 
Tree  search,  Trie  search, 
optimum,  413,  425,  549,  see  also 

Optimum  binary  search  trees,  Optimum 
digital  search  trees, 
parallel,  425. 

related  to  sorting,  v,  2,  393-394,  409,  660. 
text,  511,  572,  578. 
two-dimensional,  207. 

Sears,  Richard  Warren,  757. 

Secant  numbers,  610-611. 

Secondary  clustering,  529,  551,  554. 

Secondary  hash  codes,  741. 

Secondary  key  retrieval,  395,  559-582. 

Sedgewick,  Robert,  91,  93,  95,  114,  115,  122, 
136,  152,  155,  157,  477,  512,  623,  629, 
630,  633,  638,  645,  674,  726. 

Seeding  in  a tournament,  208. 

Seek  time,  358,  362-365,  368-369, 

407,  562-563. 

Sefer  Yetzirah  (m>2P  IDO),  23. 

Seidel,  Philipp  Ludwig  von,  611. 

Seidel,  Raimund,  478. 

Selection  of  t largest,  218-219,  408. 
networks  for,  232-234,  238. 

Selection  of  tth  largest,  136,  207-219,  472. 
networks  for,  234,  238. 

Selection  sorting,  54-55,  73,  138-158,  222. 
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Selection  trees,  141-144,  252,  256-258. 
Self-adjusting  binary  trees,  see  Splay  trees. 
Self-inverse  permutations,  599,  see 
also  Involutions. 

Self-modifying  programs,  85,  107,  174,  640. 
Self-organizing  files,  401-403,  405-406, 

478,  521,  646. 

Selfridge,  John  Lewis,  8. 

Senko,  Michael  Edward,  487. 

Sentinel:  A special  value  placed  in  a table, 
designed  to  be  easily  recognizable 
by  the  accompanying  program,  4, 

105,  159,  252,  308,  387. 

Separation  sorting,  343. 

Sequential  allocation,  96,  149,  163-164, 
170-171,  386,  459. 

Sequential  file  processing,  2-3,  6-10,  248. 
Sequential  search,  396-409,  423. 

Sets,  testing  equality,  207. 

testing  inclusion,  393-394. 

Sevcik,  Kenneth  Clem,  564. 

Seward,  Harold  Herbert,  79,  170,  255, 

387,  670,  696. 

Sexagesimal  number  system,  420. 

Seymour,  Paul  Douglas,  402. 

Shackleton,  Patrick,  136. 

Shadow  keys,  588. 

Shanks,  Daniel  Charles,  591. 

Shannon,  Claude  Elwood,  Jr.,  442,  457,  712. 
Shapiro,  Gerald  Norris,  226-227,  229,  243. 
Shapiro,  Henry  David,  668. 

Shapiro,  Louis  Welles,  607. 

Shar,  Leonard  Eric,  416,  423,  706. 

Shasha,  Dennis  Elliott,  488. 

Shearer,  James  Bergheim,  660. 

Sheil,  Beaumont  Alfred,  457. 

Shell,  Donald  Lewis,  83,  93,  279. 

Shellsort,  83-95,  98,  102-105,  111,  148, 

380,  382,  389,  669,  698. 

Shepp,  Lawrence  Alan,  611. 

Sherman,  Philip  Martin,  492. 

Shields,  Paul  Calvin,  728. 

Shift-register  device,  407. 

Shifted  tableaux,  67. 

Shockley,  William  Bradford,  668. 

Sholmov,  Leonid  Ivanovich  (HIojimob, 
JleoHHA  IlBaiiOBHu),  351. 

Shrairman,  Ruth,  152. 

Shrikhande,  Sharadchandra  Shankar 
746. 

Shuffle  network,  227,  236-237. 

Shuffling,  7,  237. 

SICOMP:  SIAM  Journal  on  Computing , 
published  by  the  Society  for  Industrial 
and  Applied  Mathematics  since  1972. 
Sideways  addition,  235,  643,  644,  717. 

Siegel,  Alan  Richard,  708,  742. 

Siegel,  Shelby,  623. 

Sifting,  80,  see  Straight  insertion. 

Siftup,  70,  145-146,  153-154,  157. 


Signed  magnitude  notation,  177. 

Signed  permutations,  615. 

Silicon  Graphics  Origin2000,  390. 

Silver,  Roland  Lazarus,  591. 

Silverstein,  Craig  Daryl,  152. 

Simon,  Istvan  Gusztav,  642. 

Simulation,  351-353. 

Singer,  Theodore,  279. 

Singh,  Parmanand  fRIT),  270. 

Single  hashing,  556-557. 

Single  rotation,  461,  464,  477. 

Singleton,  Richard  Collom,  99,  115, 

122,  136,  572,  581. 

Sinking  sort,  80,  106,  see  Straight  insertion. 

Skew  heaps,  152. 

Skip  lists,  478. 

Slagle,  James  Robert,  704. 

SLB  (shift  left  rAX  binary),  516,  529. 

Sleator,  Daniel  Dominic  Kaplan,  152, 

403,  478,  583,  718. 

Sloane,  Neil  James  Alexander,  479. 

Slupecki,  Jerzy,  209. 

Smallest-in-first-out,  see  Priority  queues. 

Smith,  Alan  Jay,  168,  695. 

Smith,  Alfred  Emanuel,  392. 

Smith,  Cyril  Stanley,  593. 

Smith,  Wayne  Earl,  405. 

Snow  job,  255-256,  260-261,  263-266. 

Snyder  Holberton,  Frances  Elizabeth, 

324,  386,  387. 

Sobel,  Milton,  212,  215,  216,  217,  218. 

Sobel,  Sheldon,  311,  316. 

SODA:  Proceedings  of  the  ACM-SIAM 
Symposia  on  Discrete  Algorithms, 
inaugurated  in  1990. 

Software,  387-390. 

Solitaire  (patience),  42-45. 

Sort  generators,  338-339,  387-388. 

Sorting  (into  order),  1-391;  see  External 
sorting,  Internal  sorting;  Address 
calculation  sorting,  Enumeration 
sorting,  Exchange  sorting,  Insertion 
sorting,  Merge  sorting,  Radix  sorting, 
Selection  sorting, 
adaptive,  389. 
by  counting,  75-80. 
by  distribution,  168-179. 
by  exchanging,  105-138. 
by  insertion,  73,  80-105,  222. 
by  merging,  98,  158-168. 
by  reversals,  72. 
by  selection,  138-158. 
history,  251,  383-390,  421. 
in  O(N)  steps,  5,  102,  176-179,  196,  616. 
into  unusual  orders,  7-8. 
methods,  see  Binary  insertion  sort,  Bitonic 
sort,  Bubble  sort,  Cocktail-shaker  sort, 
Comparison  counting  sort,  Distribution 
counting  sort,  Heapsort,  Interval 
exchange  sort,  List  insertion  sort,  List 
merge  sort,  Median-of-three  quicksort, 
Merge  exchange  sort,  Merge  insertion 


INDEX  AND  GLOSSARY  779 


sort,  Multiple  list  insertion  sort,  Natural 
merge  sort,  Odd-even  transposition  sort, 
Pratt  sort,  Quicksort,  Radix  exchange 
sort,  Radix  insertion  sort,  Radix  list 
sort,  Samplesort,  Shellsort,  Straight 
insertion  sort,  Straight  merge  sort, 
Straight  selection  sort,  Tree  insertion 
sort,  Tree  selection  sort,  Two-way 
insertion  sort;  see  also  Merge  patterns, 
networks  for,  219-247. 
optimum,  180-247. 
parallel,  113,  222-223,  228-229, 

235,  390,  671. 

punched  cards,  169-170,  175, 

383-385,  694. 

related  to  searching,  v,  2,  393-394, 

409,  660. 

stable,  4-5,  17,  24,  25,  36-37,  79,  102,  134, 
155,  167,  347,  390,  584,  615,  653. 
topological,  9,  31-32,  62,  66-67,  187, 

216,  393,  593. 
two-line  arrays,  34. 

variable-length  strings,  177,  178,  489,  633. 
with  one  tape,  353-356. 
with  two  tapes,  348-353,  356. 

Sos,  Vera  Turan  Paine,  518,  747. 

Soundex,  394-395. 

Spacings,  458. 

Sparse  arrays,  721-722. 

Spearman,  Charles  Edward,  597. 

Speedup,  see  Loop  optimization. 

Spelling  correction,  394,  573. 

Sperner,  Emanuel,  theorem,  744. 

Splay  trees,  478. 

Splitting  a balanced  tree,  474-475,  480. 

Sprugnoli,  Renzo,  513. 

Spruth,  Wilhelm  Gustav  Bernhard,  538,  555. 

Spuler,  David  Andrew,  711. 

SRB  (shift  right  rAX  binary),  125-126, 

134,  411. 

Stable  merging,  390. 

Stable  sorting,  4-5,  17,  24,  25,  36-37, 

79,  102,  134,  155,  167,  347,  390, 

584,  615,  653. 

Stacks,  21,  60,  114-117,  122,  123-125,  135, 
148,  156,  168,  177,  299,  310,  350,  473. 

Stacy,  Edney  Webb,  704. 

Stael-Holstein,  Anne  Louise  Germaine 
Necker,  Baronne  de,  589. 

Standard  networks  of  comparators,  234, 
237-238,  240,  244. 

Stanfel,  Larry  Eugene,  457. 

Stanley,  Richard  Peter,  69,  600,  605, 

606,  670,  671. 

Stasevich,  Grigory  Vladimirovich  (CTaceBHu, 
r pHrOpHH  BjiaOTMHpOBHH),  91. 

Stasko,  John  Thomas,  152. 

Static  table  searching,  393,  409-426, 

436-458,  492-496,  507-508,  513-515. 

Stearns,  Richard  Edwin,  351,  356. 


Steiner,  Jacob,  745. 

Steiner  triple  systems,  576-577, 

580-581,  745. 

Steinhaus,  Hugo  Dyonizy,  186,  209,  422,  518. 

Stepdowns,  160,  262. 

Stevenson,  David  Kurl,  671. 

Stirling,  James, 

approximation,  63,  129,  182,  197. 
numbers,  45,  175,  455,  602,  653, 

679,  739,  754. 

STOC:  Proceedings  of  the  ACM 

Symposia  on  Theory  of  Computing, 
inaugurated  in  1969. 

Stockmeyer,  Paul  Kelly,  202. 

Stone,  Harold  Stuart,  237,  425. 

Stop/start  time,  319-320,  331,  342. 

Stoyanovskii,  Alexander  Vasil’evich 
(Ctobhobckhh,  AjiCKcaimp 
BacHjibeBHu) , 70,  614. 

Straight  insertion  sort,  80-82,  96,  102, 

105,  110,  116-117,  127,  140,  148,  163, 
222-223,  380,  382,  385,  386,  390,  676. 

Straight  merge  sort,  162-163,  167, 

183,  193,  387. 

Straight  selection  sort,  110,  139-140,  148, 
155-156,  381,  382,  387,  390. 

Stratified  trees,  152. 

Straus,  Ernst  Gabor,  704. 

Strings:  Ordered  subsequences,  248, 
see  Runs. 

Strings:  Sequences  of  items,  22,  27-28, 

72,  248,  494. 

recurrence  relations  for,  274-275, 

284,  287,  308. 
sorting,  177,  178,  489,  633. 

Striping,  342,  370-373,  378,  379,  389,  698. 

Strong,  Hovey  Raymond,  Jr.,  549. 

Strongly  T-fifo  trees,  310-311,  345,  348. 

Successful  searches,  392,  396,  532,  550. 

Sue,  Jeffrey  Yen  (Jt|5l'£l),  693. 

Suel,  Torsten,  623,  667. 

Sugito,  Yoshio  727. 

Sum  of  uniform  deviates,  47. 

Summation  factor,  120. 

Sun  SPARCstation,  782. 

Superblock  striping,  370,  371,  379. 

Superfactorials,  612. 

Superimposed  coding,  570-573,  579. 

Surnames,  encoding,  394-395. 

Sussenguth,  Edward  Henry,  Jr.,  496. 

Swierczkowski,  Stanislaw  Slawomir,  518. 

Swift,  Jonathan,  vii. 

Sylvester,  James  Joseph,  622. 

Symbol  table  algorithms,  3,  426-435, 

455,  496-512,  520-558. 

Symmetric  binary  B-trees,  477. 

Symmetric  functions,  239,  608-609. 

Symmetric  group,  48,  see  Permutations. 

Symmetric  order:  Left  subtree,  then  root, 
then  right  subtree,  412,  427,  658. 

Symvonis,  Antonios  (EopPtovric, 

AvccovtoQ,  702. 

SyncSort,  369,  371,  699. 

Szekeres,  Gyorgy  (=  George),  66. 

Szemeredi,  Endre,  228,  549,  673,  740. 

Szpankowski,  Wojciech,  726,  727,  728. 
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T-fifo  trees,  310-311. 

strongly,  310—311,  345,  348. 

X-lifo  trees,  305-310,  346,  348. 

Tableaux,  47-72,  240,  670-671. 

Tables,  392. 

of  numerical  quantities,  748-751. 

Tag  sorting,  see  Keysorting. 

Tail  inequalities,  379,  636. 

Tainiter,  Melvin,  740. 

Takacs,  Lajos,  745. 

Tamaki,  Jeanne  Keiko  (jESMS^))  454. 
Tamari,  Dov  (nail  37),  born  Bernhard 
Teitler,  718. 

Tamminen,  Markku,  176-177,  179. 

Tan,  Kok  Chye  457,  711. 

Tangent  numbers,  602,  610-611. 

Tanner,  Robert  Michael,  660. 

Tannier,  Eric,  615. 

Tanny,  Stephen  Michael,  606. 

Tape  searching,  403-407. 

Tape  splitting,  281-287. 

polyphase  merge,  282-285,  287,  298, 
326-327,  333,  338. 

Tapes,  see  Magnetic  tapes. 

Tardiness,  407. 

Tarjan,  Robert  Endre,  152,  214,  215, 

403,  477,  478,  549,  583,  590,  649, 

652,  713,  718,  722. 

Tarter,  Michael  Ernest,  99. 

Tarui,  Jun  230. 

Telephone  directories,  409,  561,  573. 
Tengbergen,  Cornelia  van  Ebbenhorst,  744. 
Tenner,  Bridget  Eileen,  669. 

Tennis  tournaments,  207-208,  216. 
Terabyte  sorting,  390. 

Ternary  comparison  trees,  194. 

Ternary  heaps,  157. 

Ternary  trees  for  tries,  512. 

Terquem,  Olry,  591. 

Tertiary  clustering,  554. 

Testing  several  conditions,  406. 

Teuhola,  Jukka  Ilmari,  649. 

TfeX,  iv,  vi-vii,  531,  722,  782. 
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00 

1 

01 

2 

02 

2 

03 

10 

No  operation 

N0P(0) 

rA  4—  rA  + V 

ADD(0 : 5) 
FADDC6) 

rA  4-  rA  — V 

SUB (0:5) 
FSUB(6) 

rAX  4-  rA  x V 

MUL(0 : 5) 
FMUL(6) 

08 

2 

09 

2 

10 

2 

11 

2 

rA  4—  V 

LDA(0 : 5) 

rll  4-  V 

LD1 (0 : 5) 

rI2  4-  V 

LD2(0:5) 

rI3  4-  V 

LD3(0 : 5) 

16 

2 

IT 

2 

18 

2 

19 

2 

rA  4-  -V 

LDAN(0:5) 

rll  4-  -V 

LD1N (0 : 5) 

rI2  4-  -V 

LD2N(0:5) 

rI3  4-  -V 

LD3N (0 : 5) 

24  2 

25  2 

26  2 

27  2 

M(F)  4—  rA 

STA(0 : 5) 

M(F)  4-  rll 

ST1 (0 : 5) 

M(F)  4-  rI2 

ST2(0:5) 

M(F)  4-  rI3 

ST3(0 : 5) 

32 

2 

33 

2 

34 

1 

35 

1 + T 

M(F)  4-  rJ 

STJ(0:2) 

M(F)  4-  0 

STZ(0:5) 

Unit  F busy? 

JBUS(0) 

Control,  unit  F 

I0C(0) 

40 

1 

41 

1 

42 

1 

43 

1 

rA  : 0,  jump 

JAM 

rll  : 0,  jump 

JIM  . 

rI2  : 0,  jump 

J2M 

rI3  : 0,  jump 

J3M 

48 

1 

49  1 

50 

1 

51  1 

rA  4-  [rA]?  ± M 

INCA(0)  DECA(l) 
ENTA(2)  ENNAC3) 

rll  4-  [rll]?  ± M 

INCl(O)  DECl(l) 
ENT1C2)  ENN1C3) 

rI2  4-  [rI2]?  ± M 

INC2C0)  DEC2C1) 
ENT2(2)  ENN2(3) 

rI3  4-  [rI3]  ? ± M 

INC3(0)  DEC3(1) 
ENT3(2)  ENN3(3) 

56  2 

57  2 

58  2 

59 

2 

Cl  4-  rA(F)  : V 

CMPA(0 : 5) 
FCMPC6) 

Cl  4-  rll(F)  : V 

CMP1(0:5) 

Cl  4-  rI2(F)  : V 

CMP2(0:5) 

Cl  4-  rI3(F)  : V 

CMP3(0:5) 

General  form:  C = operation  code,  (5  : 5)  field  of  instruction 

F = op  variant,  (4  : 4)  field  of  instruction 
M = address  of  instruction  after  indexing 
V = M(F)  = contents  of  F field  of  location  M 
0P  = symbolic  name  for  operation 
(F)  = normal  F setting 

t = execution  time;  T = interlock  time 
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48  49  50  51  52  53  54  55 

= $ < > a ; : ' 

04  12 

05  10 

06  2 

07  1 + 2F 

rA  4-  rAX/V 

Special 

Shift  M bytes 

Move  F words 

rX  4—  remainder 

NUM(O) 

SLA(O)  SRA(l) 

from  M to  rll 

DIV(0:5) 

CHAR ( 1 ) 

SLAXC2)  SRAX(3) 

M0VE(1) 

FDIV(6) 

HLT(2) 

SLC(4)  SRC (5) 

12  g 

13  2 

14  2 

15  2 

rI4  4-  V 

rI5  4—  V 

rI6  4- V 

rX  4—  V 

LD4(0:5) 

LD5(0:5) 

LD6(0:5) 

LDX(0:5) 

20  2 

21  2 

22  2 

23  2 

rI4  4 V 

rI5  4-  -V 

rI6  4-  -V 

rX  4 V 

LD4N(0:5) 

LD5N(0:5) 

LD6N(0:5) 

LDXN(0:5) 

28  2 

29  2 

30  2 

31  2 

M(F)  4—  rI4 

M(F)  4—  rI5 

M(F)  4-  rI6 

M(F)  4-  rX 

ST4(0 : 5) 

ST5(0:5) 

ST6(0:5) 

STX(0:5) 

36  1 + T 

37  1 + T 

38  1 

39  1 

Input,  unit  F 

Output,  unit  F 

Unit  F ready? 

Jumps 

JMP(O)  JSJ(l) 

IN(0) 

0UT(0) 

JRED(O) 

JOV (2)  JN0V(3) 
also  [*]  below 

44  1 

45  1 

46  1 

47  1 

rI4  : 0,  jump 

rI5  : 0,  jump 

rI6  : 0,  jump 

rX  : 0,  jump 

J4[+] 

J5[+] 

J6[+] 

JX[+] 

52  1 

53  1 

54  1 

55  1 

rI4  4-  [rI4]  ? ± M 

rI5  4-  [rI5]?  ± M 

rI6  4-  [rI6]?  ± M 

rX  4-  [rX]?  ± M 

INC4(0)  DEC4C1) 

INC5(0)  DEC5C1) 

INC6(0)  DEC6C1) 

INCX(O)  DECX(l) 

ENT4(2)  ENN4(3) 

ENT5(2)  ENN5(3) 

ENT6C2)  ENN6(3) 

ENTX(2)  ENNXC3) 

60  2 

61  2 

62  2 

63  2 

Cl  4-  rI4(F)  : V 

Cl  4-  rI5(F)  : V 

Cl  4-  rI6(F)  : V 

Cl  4-  rX(F)  : V 

CMP4(0:5) 

CMP5(0 : 5) 

CMP6(0:5) 

CMPX(0 : 5) 

[*]: 

[+]: 

rA  = register  A 

JL(4) 

< 

N(0) 

rX  = register  X 

JE(5) 

— 

Z(l) 

rAX  = registers  A and  X as  one 

JG(6) 

> 

P (2) 

rli  = index  register  i,  1 < i < 6 

JGE(7) 

> 

NN(3) 

rJ  = register  J 

JNE(8) 

NZ(4) 

Cl  = comparison  indicator 

JLE(9) 

< 

NP(5) 

25  26  27  28  29  30  31  32  33  34  35  36  37  3# 
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04 

12 

05 

rA  «(-  rAX/V 
rX  4—  remainder 
DIV(0:5) 
FDIV(6) 

Special 

NUM(O) 

CHAR(l) 

HLT(2) 

12 

2 

13 

rI4  4—  V 

LD4(0 : 5) 

rI5  4—  V 

LD5(0 : 5) 

20 

2 

21 

rI4  4-  -V 

LD4N(0:5) 

rI5  4-  -V 

LD5N(0:5) 

28  2 

29 

M(F)  4-  rI4 

ST4(0 : 5) 

M(F)  4-  rl 

ST5(0:5) 

36 

1 + T 

37 

1 

Input,  unit  F 

IN(O) 

Output,  unil 

OUT(O) 

44 

1 

45 

rI4  : 0,  jump 

J4  [+] 

rI5  : 0,  jum] 

J5[+] 

52  1 

53 

rI4  4—  [rI4]  ? ± M 

INC4(0)  DEC4C1) 
ENT4(2)  ENN4(3) 

rI5  4-  [rI5]?  ± i 

INC5(0)  DEC 5 
ENT5(2)  ENN5 

60 

2 

61 

Cl  4-  rI4(F)  : V 

CMP4(0 : 5) 

Cl  4-  rI5(F)  : 

CMP5(0:5) 

[ 

rA  = register  A J 

rX  — register  X J 

rAX  = registers  A and  X as  one  J 

rli  = index  register  i,  1 < i < 6 JG| 

rJ  = register  J JN 

Cl  = comparison  indicator  JL 
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theory  and  practice  for  students,  researchers,  and  practitioners  alike. 

The  bible  of  all  fundamental  algorithms  and  the  work  that  taught  many  of  today’s  software 
developers  most  of  what  they  know  about  computer  programming. 

— Byte,  September  1995 

Countless  readers  have  spoken  about  the  profound  personal  influence  of  Knuth’s  work.  Scientists 
have  marveled  at  the  beauty  and  elegance  of  his  analysis,  while  ordinary  programmers  have 
successfully  applied  his  “cookbook”  solutions  to  their  day-to-day  problems.  All  have  admired  Knuth 
for  the  breadth,  clarity,  accuracy,  and  good  humor  found  in  his  books. 

I can’t  begin  to  tell  you  how  many  pleasurable  hours  of  study  and  recreation  they  have  afforded 
me!  I have  pored  over  them  in  cars,  restaurants,  at  work,  at  home  . . . and  even  at  a Little  League 
game  when  my  son  wasn’t  in  the  line-up. 

— Charles  Long 

Primarily  written  as  a reference,  some  people  have  nevertheless  found  it  possible  and  interesting  to 
read  each  volume  from  beginning  to  end.  A programmer  in  China  even  compared  the  experience 
to  reading  a poem. 

If  you  think  you’re  a really  good  programmer . . . read  [Knuth’s]  Art  of  Computer 
Programming.  ...  You  should  definitely  send  me  a resume  if  you  can  read  the  whole  thing. 

— Bill  Gates 

Whatever  your  backgound,  if  you  need  to  do  any  serious  computer  programming,  you  will  find 
your  own  good  reason  to  make  each  volume  in  this  series  a readily  accessible  part  of  your  scholarly 
or  professional  library. 

It’s  always  a pleasure  when  a problem  is  hard  enough  that  you  have  to  get  the  Knuths  off  the 
shelf.  I find  that  merely  opening  one  has  a very  useful  terrorizing  effect  on  computers. 

— Jonathan  Laventhol 

For  the  first  time  in  more  than  20  years,  Knuth  has  revised  all  three  books  to  reflect  more  recent 
developments  in  the  field.  His  revisions  focus  specifically  on  those  areas  where  knowledge  has 
converged  since  publication  of  the  last  editions,  on  problems  that  have  been  solved,  on  problems^  — 
that  have  changed.  In  keeping  with  the  authoritative  character  of  these  books,  all  historical 
information  about  previous  work  in  the  field  has  been  updated  where  necessary.  Consistent  with 
the  author’s  reputation  for  painstaking  perfection,  the  rare  technical  errors  in  his  work,  discovered 
by  perceptive  and  demanding  readers,  have  all  been  corrected.  Hundreds  of  new  exercises  have 
been  added  to  raise  new  challenges. 
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